Video Latency
Posted: 04 February 2010 08:51 AM   [ Ignore ]   [ # 11 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  585
Joined  2009-09-17
igor1960 - 04 February 2010 02:19 AM

640*480 @15fps:  6%CPU   Latency: ~130ms or around 2frames:  rendering delay ~8ms;  capture/transport delay ~122ms
640*480 @30fps:  11%CPU Latency: ~66ms or around 2frames;  rendering delay ~7ms;  capture/transport delay ~59ms
640*480 @40fps:  16% CPU Latency: ~52ms or around 3frames;  rendering delay ~6.5ms;  capture/transport delay ~45.5ms
640*480 @50fps:  19% CPU Latency: ~60ms or around 3frames;  rendering delay ~10ms;  capture/transport delay ~50ms
640*480 @60fps:  30% CPU Latency: ~96ms or around 6frames;  rendering delay ~26ms;  capture/transport delay ~70ms
640*480 @75fps:  50% CPU Latency: ~96ms or around 7frames;  rendering delay ~32ms;  capture/transport delay ~68ms

Somehow huge jump in CPU usage from 60FPS to 75FPS. But this I’ve already reported in one of previous threads with CL SDK…
Also, jumps in CPU usage for 60 and 75FPS could be also explained: as latency grows to 6-7 frames, my algorithm itself spends more time in checking blank frame arrival…

I assume you run dual core processor. In that case according to your last result (75fps) it would mean that the core that was doing the processing was running at 100%. I am really curious to see your algorithm for checking blank frames. It seems to me that you are spending way too much time on that. Under such a conditions I would totally reject your latency finding @ 75fps.

If you want to do this research that’s fine, but as a researcher one of the basic things you have to know is that you have to objectively consider and examine all the facts and not prematurely jump to conclusions. It seems to me that you have a tendency to do just that (jump to conclusions).

Your result at 75fps was wrong because your CPU usage was 100% and under such a conditions you cannot possibly expect to make any accurate measurements especially when you’re going through layers and layers of API calls under the system that is not even soft real-time.

So as I said I’m not sure why are you performing these tests this way. For more realistic average latency test you would have to present a blank frame at random time, capture, and measure its time of arrival. Then you would sum those delays up and average them. Since your monitor is running at 75fps (it’s redrawing at periodic interval) this is already not random and your latency spread will be skewed. However the best test (involves custom hardware) would be to pull the vsync signal from the camera and measure the time difference between this signal and the time your captured frame is delivered to your buffer.

As I said before, I would like to see how are you checking for the blank frame (your algorithm) since obviously your CPU is not fast enough to do this in real time. So probably optimizing this will drop the CPU usage down and will give you more accurate results.

AlexP

Profile
 
 
Posted: 04 February 2010 10:58 AM   [ Ignore ]   [ # 12 ]
Jr. Member
RankRank
Total Posts:  49
Joined  2010-01-15

ALex,

I’m not jumping to conclusions.
As I don’t have ways to know and you are not publishing details of your driver/firmware implementation, I’ve just asked you if there is any buffering on both sides of USB transport.
My original tests (maybe not kosher enough) showed results that could only be explained by some possible buffering and thats why I’ve asked.
I’ve never implied that something wrong is going on with camera software and don’t want to see you upset and/or defensive and/or lecturing me. Just wanted to know, where the bottleneck might be.

Now, as you told me that there is no buffering envolved, I fully understand that my measuring algorithm doesn’t properly work at high frame rates and I should rethink it.

In fact, what I’m talking here is not just “hypothetical” system, but real system, that involves for example some targeting and for which full latency including rendering is important. Just imagine something like submarines periscope for example that has camera above the water surface and display below. So, we obviously want to have as little latency between camera capture and full display rendering in this case.

As the result of your answer, I now understand that bottleneck is probably related with the limited frame rate of the monitor. It looks like my blank frame might not be displayed on the monitor at all, as to be able to display all captured frames on 75hz monitor and having 75fps of capturing fps => the whole system must be synchronized in exact order and have no fluctuations at all. And you are right that is probably impossible to achieve (using DirectX at least).

So, I have to rethink on ways of reimplementing the system and/or just abandone the idea and/or possibly move to higher frequency monitors and/or splitting processing/rendering into separate threads running on multiple cores/CPUs

Profile
 
 
Posted: 04 February 2010 11:22 AM   [ Ignore ]   [ # 13 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  585
Joined  2009-09-17
igor1960 - 04 February 2010 10:58 AM

ALex,

I’m not jumping to conclusions.
As I don’t have ways to know and you are not publishing details of your driver/firmware implementation, I’ve just asked you if there is any buffering on both sides of USB transport.
My original tests (maybe not kosher enough) showed results that could only be explained by some possible buffering and thats why I’ve asked.
I’ve never implied that something wrong is going on with camera software and don’t want to see you upset. Just wanted to know, where the bottleneck might be.

Now, as you told me that there is no buffering envolved, I fully understand that my measuring algorithm doesn’t properly work at high frame rates and I should rethink it.

In fact, what I’m talking here is not just “hypothetical” system, but real system, that involves for example some targeting and for which full latency including rendering is important. Just imagine somethink like submarines periscope for example that has camera above the water surface and display below. So, we obviously want to have as little latency between camera capture and full display rendering in this case.

As the result of your answer, I now understand that bottleneck is probably related with the limited frame rate of the monitor. It looks like my blank frame might not be displayed on the monitor at all, as to be able to display all captured frames on 75hz monitor and having 75fps of capturing fps => the whole system must be synchronized in exact order and have no fluctuations at all. And you are right that is probably impossible to achieve (using DirectX at least).

So, I have to rethink on ways of reimplementing the system and/or just abandone the idea and/or possibly move to higher frequency monitors.

I am not upset and I’m not trying to lecture you on anything, I’m just trying to help you setup more realistic test and help you get real results with this test. I published pretty much all of the details about the capture driver. What are you specifically interested for me to disclose? As I said before the camera has no buffering capabilities and any data received by USB is being transferred directly to the user buffer (as fast as it will go under Windows) upon complete frame received. This is then copied to the DS buffer in the FillBuffer function. So any extra buffering that might occur between my code and your test app will be in the DS since I now allocate (based on your suggestion) in DecideBufferSize function 10 frame buffers. Are you getting the most recent buffer from the capture filter every time?

So, here is how I would modify your test:

1. Setup the test under 320x240 resolution (this will eliminate the CPU issue)
2. Start camera capture graph
3. Set your screen to black.
4. At random (100-300ms) time change display to white and start timing until you get the white frame recognized by the camera. 
5. Record the time difference (latency time).
6. Generate random number (300-700ms) and wait that many ms.
7. Repeat from step 3.

Once you have let say 100 latency numbers find min, max, and average.

Btw, in your code to check for the blank frame, to speed things up, don’t scan the whole image for change but just a subset of it. For example, you can just check 16x16 pixels in the middle of your image.

Hopefully this will give you some better and more stable results.

AlexP

Profile
 
 
Posted: 04 February 2010 01:48 PM   [ Ignore ]   [ # 14 ]
Jr. Member
RankRank
Total Posts:  49
Joined  2010-01-15

Alex,

Again, thanks. I’ve got my answer: no buffering—that’s all I need.
As to algorithm: what you are suggesting is exactly what I’m already doing.

However, there is one problem (and as you mentioned before) is the main reason, why this algorithm is not perfectly working.
The reason is: as monitor is used to triger new captured image, the frequency rate of the monitor should be much higher then capturing FPS.
So, when we are talking about capturing at 30fps, monitor with 75FPS is good enough not to introduce big delay: as even if it happens and rendering has to wait for the full monitors update (time between sequential vsyncs), that monitor delay is less then integration time required for capturing next frame.
However, when we are talking about capturing at 75fps, we ideally have to use monitor with 150FPS (or even higher) not to have big enough monitor delay. As I’m using the same 75fps monitor, even if the system manages to render all and each of 75FPS captured (might not be possible, as this would require extremely precise sync between capturing and rendering) delay in monitor becomes significantly higher relative to capturing and is equal to time required to capture 1 extra frame. So, even on perfectly synced system we should expect one extra frame delay if we will be using monitor with refresh rate equal or close to capturing rate.

However, the system is not “perfect” and rendering part is not precisely synced to capturing. In fact, default system renderers (VMR incuded) when running with capturing at 75fps are showing around 20-30FPS dropage.
So, I wrote my own and highlgly optimized multithreaded renderer, that suppose to drop less frames. However,  even this one when I’m running capturing at 60fps+ shows max. 56-58FPS really rendered. As explained above this is just due to limit in monitors fps.

The only way to reach exactly 75fps rendered with 75fps captured on 75hz monitor is obviously to have the full processing path from capturing and presentation optimized exactly, so next captured/processed frame will arrive to Present at the point of monitors vsync. And this is probably extremely hard to do.

Alternative measuring solution might be yours LED suggestion, that might completely remove rendering from equation. However, as I said I’m interested in finding latency that includes rendering.

Profile
 
 
Posted: 04 February 2010 02:04 PM   [ Ignore ]   [ # 15 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  585
Joined  2009-09-17

The LED Method
Here is another thing you could try that would more closely determine the camera latency numbers.
If you are versed in hardware, you could wire up a simple circuit consisting of one LED and a resistor to your parallel port.

Something like this:
?ACT=47&fid=4&aid=58_5vVXIygdJX7mxSQcY7zH&board_id=1
Note: You would wire only LED #1

Then using for example WinRing0 library you can turn that LED on and off in software with zero (or very low) latency.
Now you can do that same experiment I described earlier where you would randomly turn on the LED and capture it with camera.

This is by the way, what I’m planning to do to test the camera latency delay.

AlexP

Image Attachments
lpt_leds.jpg
Profile
 
 
Posted: 04 February 2010 03:40 PM   [ Ignore ]   [ # 16 ]
Jr. Member
RankRank
Total Posts:  49
Joined  2010-01-15

Alex,

Right, that would work for measuring camera latency with one frame precission
BTW: remember we talked about enabling RED camera LED.
I assume sending LED on signal through USB might have latency too.
And if this latency is more then 1 frame, having that method with external LED and then using the same algorithm using internal LED might give as the result latency of internal LED.

However, as I said before, I’m considering only some method that would include full rendering. Therefore, I don’t see any other way then including monitor output into the loop.
Therefore, I should look into some kind of optimization (if possible) that would render all arrived samples with exactly or at least very close to monitors FPS without prefferably skipping even one sample. Unfortunately, MSFTs and/or other DirectShow renderers that I know off are optimized to deliver streaming video with buffering ant etc. and obviously up to or around 30FPS

This camera is a great tool to consider it as a base for such implementation.
So, now as I know that there is no internal buffering, I might think about possible implementation. This might be hard though (especially for 3 remaining neurons in my brain)...

Profile
 
 
Posted: 04 February 2010 04:17 PM   [ Ignore ]   [ # 17 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  585
Joined  2009-09-17

Yes, the red LED control functionality will be added to the API in the next release of the SDK. And yes, that is a simple way to measure the latency of the camera since the latency of turning on and off the camera LED should be constant. This LED control data is internally sent through the the control pipe (very small packet) and as such the API does not return until the data is received by the target device. So it is safe to assume that the LED changes its state at the time the function returns. This will pretty much eliminate any LED control latency. Then its just a matter of measuring the brightness change of the camera image. You could use a simple mirror to let camera directly see its LED.

So knowing the camera capture latency might help you optimize your DirectShow code to minimize the total latency of the capture-display loop.

Also you need to include the monitor response time in your equation, because modern monitors have pretty complex circuitry that asynchronously drives the physical display. So even though you might know the precise time of the VSync of your graphics card that does not mean that the monitors VSync happens at the same time (this was true in the old days of CRT displays). I know for a fact that on some models of LCD displays, the whole image is first loaded into internal frame memory of the display controller and then rendered out to the physical panel asynchronously with the incoming image data. This is why, even though this latency should be pretty much constant, the display response latency is also an important factor to consider.

So since you are playing with high-performance capture/rendering as you said maybe DirectShow is not the best suited framework for this. You might want to consider using DirectX fully and completely get away from DirectShow framework.

AlexP

Profile
 
 
Posted: 04 February 2010 09:42 PM   [ Ignore ]   [ # 18 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  585
Joined  2009-09-17

Internal LED Capture Latency Test

I implemented a simple latency test program (it will be included in the SDK) that uses PS3Eye’s internal red LED to trigger the latency timing. The program is implemented using two threads, one that captures and checks the video frame for LED light and other one that after a random interval turns on the camera’s internal LED.
The camera was then put in the dark shoebox with a mirror in front of it (so that can clearly see its red LED). The blue LED was masked out.

Here are the latency results (each one was run for a few minutes):

Grayscale Mode

- 640x480 @ 75fps Latency Min: 11.6151ms   Max: 25.565ms   Avg: 18.2541ms
- 640x480 @ 60fps Latency Min: 13.9225ms   Max: 30.373ms   Avg: 22.0655ms
- 640x480 @ 30fps Latency Min: 23.1228ms   Max: 56.288ms   Avg: 39.7403ms
- 640x480 @ 15fps Latency Min: 38.0911ms   Max: 104.65ms   Avg: 70.1092ms

Color Mode

- 640x480 @ 75fps Latency Min: 15.3318ms   Max: 28.234ms   Avg: 21.5017ms
- 640x480 @ 60fps Latency Min: 17.2339ms   Max: 33.576ms   Avg: 25.2352ms
- 640x480 @ 30fps Latency Min: 23.5517ms   Max: 56.527ms   Avg: 38.9832ms
- 640x480 @ 15fps Latency Min: 36.9829ms   Max: 102.48ms   Avg: 68.4732ms

From the results we can see that the both grayscale and color modes give us consistent results.
We can conclude that the average latency given frame rate is ~1.5 frame time.

AlexP

Profile
 
 
Posted: 05 February 2010 01:57 AM   [ Ignore ]   [ # 19 ]
Jr. Member
RankRank
Total Posts:  49
Joined  2010-01-15

Alex, you are great.
And results are great too.
Hope you will share the source code of that program in yours distribution.

Profile
 
 
Posted: 07 February 2010 10:26 AM   [ Ignore ]   [ # 20 ]
New Member
Rank
Total Posts:  23
Joined  2010-01-18

Haha, pretty nice!
Even though i’m not that experienced in the coding department, i’ve followed your debate and it’s cool to see the conclusion.
just my two cents..

@alex: this is definitely the wrong place for my question, but i’ll ask nevertheless ^^
will any kind of prolonged exposure mode be included in the API when the next SDK is released?

Cheers

Profile
 
 
2 of 3
2
 


RSS 2.0     Atom Feed