Hey folks, looks like an awesome library! And I found this sdk right as this cool contest is going on. I’d love to get my hands on a free set of these cameras. Hope you like my idea:
Idea for four eye application
I would create a PC based robotic consciousness platform with multiple sensor integration and multiple levels of visual and sound attention focus. This platform would enable exploration of pattern recognition and behavior algorithms.
Vision:
The first two cameras would be mounted high on a simple chassis as a fixed width stereo vision pair. This pair of cameras would be fixed in the “56-degree field of view” mode for wide field imaging of the robot’s environment. This imaging system would be mounted on a pan-tilt servo system capable of driving the weight of the stereo pair to cover a wide chunk of solid angles in the robot’s environment. The speed requirements for motion on this servo system are low, so the weight is less of an issue (low speeds require less torque).
The second two cameras would be mounted on independent fast servo systems with minimal mass (just the single cameras) and their lenses would be fixed in the “75-degree field of view” mode for close-up inspection of an object. These individual cameras could be driven to track objects moving quickly in the space with high precision and to provide 2D close-up texture inspection. These cameras may even be mounted at the end of longer arm-like structures so that they could be moved into complex positions in space. Basically, this will give the robot the capacity to have selective high resolution in its visual field (like the human eye where we have high resolution color vision at the targetable center and a broad low resolution peripheral vision with a relatively static field determined by head pose).
Audio:
The first pair of linked stereo vision cameras would be mounted such that they were pointing in the same direction, but rotated 90 degrees from one another on the lens axis. That is, one camera would image the stereo field at 640x480 and the other would image the field at 480x640. This would orient the audio arrays orthogonally so that digital beam forming could produce sound localization (or spatial sound filters) in two dimensions (left to right and up and down) giving the robot the capability of complex sonographic imaging of its environment over a broad range of solid angles. Difficulties would involve the exact time synchronization of the 2 sets of 4 microphones but it might be that the audio sample clock (16kHz) on one board could be hacked and that set of 4 mics could be slaved to the other set of 4 mics. The physical separation of the two microphone arrays would also give increased spatial resolution to a sound source.
A single speaker system would be mounted below the stereo pairs (i.e. act as a mouth) and could be used to generate pulses for reflected sound/sonar mapping or for speaking into its environment in response to a stimulus. This speaker could merely be plugged into the associated PC speaker port.
Sensor Modifications:
The microphone sensor arrays on the independent mounted cameras might provide redundant information to the 2d-phased array on the robot’s stereo vision head. The microphone merely outputs a time varying voltage encoding sound pressure. I assume that the microphones may be hacked off of the internal board and replaced with a variety of signals providing a voltage level encoding a given metric from the world. Essentially, the microphone inputs can be converted into general purpose analog inputs to acquire from a variety of inexpensive sensors. One may have to bypass AC coupling filters on the input and it may be that PC “volume” control on that sound interface controls an onboard amplifier. Some examples of sensors that could replace the microphones:
Ultrasonic Range Finder
several available systems generate a simple voltage encoding distance instead of the high frequency sound info. Would provide high resolution distance measure that would compliment the stereo data.
Temperature Sensors
Either remote targeting IR beam systems outputting a voltage of temperature at the cameras target or a simple resistive temperature device powered of the USB 5V bus for measuring local ambient temp.
Feedback from a variety of contact based analog pressure sensors (for tactile feedback) or switch closures indicating some other manner of physical interaction with the robot’s system.
Humidity sensors, air flow meters
Ambient light sensors to determine illumination source location
IR sensors (i.e. for remote controls or IR communications)
Capacitive proximity sensors
Many canned systems generate a voltage encoding “capacitance” of a system based on a couple of electrodes and a 120kHz (or so) voltage. These are, for example, used to detect people or child seats for airbag deployment in cars.
Magnetic sensors could be used to detect the magnetic field of the earth or to measure currents flowing in wires or for some other applications. This could assist in navigation.
Tilt sensors or x/y/z accelerometers could provide analog signals encoding orientation of the robot
Supplementary Systems:
I would add a small 8-bit Atmel AVR controller based system in parallel with the cameras that would communicate with the PC based control application by a USB serial stream. This controller would provide the driving control signals to direct the RC servo systems to target angles in space. In general, this system would allow for the creation of complex responses to the acquired signals on the cameras and microphones.
Software:
A C++ based application platform would be constructed to acquire from the cameras using the CL-eye drivers and the windows audio interface. OpenCV algorithms would be leveraged for stereo vision and 3D kinetics calculations (based on the known geometry of the cameras and the RC servos) would drive the selective focus. Then, a wealth of high bandwidth (i.e. 8kHz) analog signals from audio and other sensors would be acquired in parallel to the video. All of these input data and output control modalities would be fed into a common space where modular, parallel analysis could be conducted.
Once such an application platform was in place, you have many multi-modal information streams (color vision, depth, localized sound, temperature, proximity, etc) and the real fun can begin. At this point, I would have a real-world platform for the implementation of pattern recognition and machine learning algorithms from which I could begin testing out AI algorithms. A simple multi-core based PC system would drive this application so that a broad range of data could be stored on disk and data structures could be created for search and recall of stored pattern information.
Then the real fun begins and I would begin testing out training methods for showing this critter how to play in the real world. Audio and visual queues could be derived from the environment to provide positive and negative evaluations to the robot’s actions. More complex actuators could be added to the system, etc. Various levels of signal processing would feed data into the learning algorithm and all could be implemented in parallel (much like the human brain). Language could be detected from the audio, etc.
Since it would be driven by a wi-fi enabled PC, such a system could have a wealth of information access on the back end. For example, it could focus in and identify an object, receive a label via audio (i.e. “that is a cat”) and then search for more examples in google images or otherwise to further solidify its model of what a cat is. A commercially available GPS system could be connected by a serial port interface for robot navigation. Eventually you could ask it simple questions like “what’s the weather like” … The possibilities extend endlessly, all enabled by these extraordinarily cheap multi-mode sensor and acquisition systems.
The system would probably have to be a PC in order to allow for the installation of 4 independent USB buses on a PCIe bus in order to handle the high data rates from the individual cameras simultaneously. A laptop probably wouldn’t easily provide such bandwidth over USB. A single one of these cameras probably saturates the 480Mb/s USB bus.
In summary,
4 Eye Cameras
Some hardware to mount stereo vision pairs (aluminum/wood and some clamps and other cheap bits)
RC-Servo motors in pan-tilt configuration – price driven down by RC helicopter/car market
Optional extra sensor components (in general a few dollars each) – Associated PCBs and such
Many times, these sensor chips can be sampled for free from the vendor
Small cheap dev board containing a controller and wires to drive the system motors (see Arduino)
A multi-core PC (laptop or otherwise) capable of driving the system and providing further back
All components are readily accessible to a hobbyist budget (i.e. under $500 total w/out PC). Opens up endless possibilities for cool pattern recognition and behavioral systems work on a sensor rich platform.