Seems CPU usage is too high
Posted: 15 January 2010 02:25 PM   [ Ignore ]
Jr. Member
RankRank
Total Posts:  49
Joined  2010-01-15

Alex/Others,

Just tried yours CL-Eye platform SDK. Great stuff: you are great.

However, several questions:

1. Seems to me for what the system suppose to do inside CLEyeCameraGetFrame: CPU utilization is too high. For example, on my system 640*480 @75fps drives it to 75% CPU load on Dual Core 2.8mghz system. At least compared to other cameras it’s extremely high. What exactly it’s spending CPU on? Debayering or IO? If IO is it possible/have you tried overlapped? If debayering, how optimized is it?
2. If debayering is done inside the driver: Is it possible to introduce different format that will return raw Bayer RGB (whatever it is 8bits and/or maybe 10bits?). That would be extremely important if possible;
3. Is it possible to provide interface to your system using standard callback mechanism instead of polling? That seems like obvious and should be solution of choice. Then you can just fire an event through your callback possibly reusing the same buffer;
4. DirectShow Video Capture driver: why it exposes just reduced set of available frame rates? Why not all of them? I read somewhere that this project is Open Source: where could I find it for custom modification? Also, somehow DirectShow capture filter doesn’t expose same high CPU utilization as CLEyeCameraGetFrame: Alex, are you hiding some faster interface? LOL.

Thanks,
Igor

Profile
 
 
Posted: 16 January 2010 02:22 PM   [ Ignore ]   [ # 1 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  585
Joined  2009-09-17
igor1960 - 15 January 2010 02:25 PM

Alex/Others,

Just tried yours CL-Eye platform SDK. Great stuff: you are great.

However, several questions:

1. Seems to me for what the system suppose to do inside CLEyeCameraGetFrame: CPU utilization is too high. For example, on my system 640*480 @75fps drives it to 75% CPU load on Dual Core 2.8mghz system. At least compared to other cameras it’s extremely high. What exactly it’s spending CPU on? Debayering or IO? If IO is it possible/have you tried overlapped? If debayering, how optimized is it?
2. If debayering is done inside the driver: Is it possible to introduce different format that will return raw Bayer RGB (whatever it is 8bits and/or maybe 10bits?). That would be extremely important if possible;
3. Is it possible to provide interface to your system using standard callback mechanism instead of polling? That seems like obvious and should be solution of choice. Then you can just fire an event through your callback possibly reusing the same buffer;
4. DirectShow Video Capture driver: why it exposes just reduced set of available frame rates? Why not all of them? I read somewhere that this project is Open Source: where could I find it for custom modification? Also, somehow DirectShow capture filter doesn’t expose same high CPU utilization as CLEyeCameraGetFrame: Alex, are you hiding some faster interface? LOL.

Thanks,
Igor

Hey Igor,

First of all thanks for visiting and your comments.

1. I agree, 75% CPU usage is way too high. Could you share the code of your main capture loop? And you are right, the driver internally uses overlapped IO for all of the USB transfers as well as an event based notification system and buffer reuse. The data transfers/copying are reduced to a minimum and when used are done via SIMD code. The debayering and color conversion algorithms are fully implemented using hand optimized SIMD instruction code (doubt could be faster).

2. The current driver code does not expose raw CMOS sensor data since you lose the ability to use camera/lens image processing functionality. This is, by the way something I have had exposed in the past versions of the driver, but have seen that no one was really using it.

3. The system uses no polling mechanism at all and it pretty much works the way you described. The frame is captured, event is signaled, the user capture thread is awaken, data is converted and returned.

4. The DS driver is geared towards webcam uses, therefore it will not benefit from these high capture rates. Furthermore, the frame rate was dropped for compatibility reasons with software such as Skype, MSN, etc.

Again, lets have a look at your code and see why your CPU usage is so high.

Regards,
AlexP

Profile
 
 
Posted: 16 January 2010 03:35 PM   [ Ignore ]   [ # 2 ]
Jr. Member
RankRank
Total Posts:  49
Joined  2010-01-15

Code is straightforward and with 60fps set on my dual core CPU I get 50% CPU utilization (one core in Performance shows almost full load).  Increasing FPS to 75 in code, causes drop in frames => real FPS reported by code doesn’t go above 55fps.


#include “stdafx.h”
#include <time.h>
#include “CLEyeMulticam.h”


// Sample camera capture class

class CLEyeCameraCapture

  GUID _cameraGUID; 
  CLEyeCameraInstance _cam; 
  CLEyeCameraColorMode _mode; 
  CLEyeCameraResolution _resolution; 
  int _fps; 
  HANDLE _hThread; 
  bool _running; 
  LPBYTE m_pBuffer;
public: 

  CLEyeCameraCapture(GUID cameraGUID, 
                CLEyeCameraColorMode mode, 
                CLEyeCameraResolution resolution, int fps) : 
                _cameraGUID(cameraGUID), _cam(NULL), _mode(mode), 
                _resolution(resolution), _fps(fps), _running(false) 
  {
  }
  ~CLEyeCameraCapture()
  {
      if(m_pBuffer)
        delete [] m_pBuffer;
  }
  bool StartCapture() 
  { 
      if(!m_pBuffer)
        return _running = false;

      _running = true;
      // Start CLEye image capture thread
      _hThread = CreateThread(NULL, 0, &CLEyeCameraCapture;::CaptureThread, this, 0, 0); 
      if(_hThread == NULL) 
      { 
        MessageBox(NULL,“Could not create capture thread”,“CLEyeMulticamTest”, MB_ICONEXCLAMATION); 
        _running = false;
      } 
      return _running;
  } 
  void StopCapture() 
  { 
      if(!_running)  return; 
      _running = false; 
      WaitForSingleObject(_hThread, 1000); 
  } 
  void IncrementCameraParameter(int param) 
  { 
      if(!_cam)  return; 
      CLEyeSetCameraParameter(_cam, 
                      (CLEyeCameraParameter)param, 
                      CLEyeGetCameraParameter(_cam, (CLEyeCameraParameter)param)+10); 
  } 
  void DecrementCameraParameter(int param) 
  { 
      if(!_cam)  return; 
      CLEyeSetCameraParameter(_cam, 
                      (CLEyeCameraParameter)param, 
                      CLEyeGetCameraParameter(_cam, (CLEyeCameraParameter)param)-10); 
  } 
  void Capture() 
  { 
      _cam = CLEyeCreateCamera(_cameraGUID, _mode, _resolution, _fps); 
      // Set some camera parameters
      CLEyeSetCameraParameter(_cam, CLEYE_GAIN, 20); 
      CLEyeSetCameraParameter(_cam, CLEYE_EXPOSURE, 511); 

      // Start capturing
      DWORD dwClock = clock()-1;
      int i=0;
      CLEyeCameraStart(_cam); 
      // image capturing loop
      while(_running)
      { 
        m_pBuffer = new BYTE[((_resolution==CLEYE_QVGA)?(320*240):(640*480))*sizeof(RGBQUAD)];
        Sleep(0);
        if(CLEyeCameraGetFrame(_cam, m_pBuffer, 10))
        {
          printf(”%d %g\r”, i++, i/((clock()-dwClock)/1000.));
          LPBYTE pBuffer = new BYTE[((_resolution==CLEYE_QVGA)?(320*240):(640*480))*sizeof(RGBQUAD)];
          if(pBuffer)
          {
              CopyMemory(pBuffer, m_pBuffer, ((_resolution==CLEYE_QVGA)?(320*240):(640*480))*sizeof(RGBQUAD));
              delete [] pBuffer;
          }
        }
        if(m_pBuffer)
          delete [] m_pBuffer;
      } 
      // Stop camera capture
      CLEyeCameraStop(_cam); 
      // Destroy camera object
      CLEyeDestroyCamera(_cam); 
      _cam = NULL; 
  } 
  static DWORD WINAPI CaptureThread(LPVOID instance) 
  {
//      SetThreadPriority(GetCurrentThread(), REALTIME_PRIORITY_CLASS);
      // forward thread to Capture function
      CLEyeCameraCapture *pThis = (CLEyeCameraCapture *)instance; 
      pThis->Capture(); 
      return 0; 
  } 
};


int _tmain(int argc, _TCHAR* argv[])
{
//  SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
  CLEyeCameraCapture *pCam = NULL; 
  int numCams = CLEyeGetCameraCount(); 
  if(numCams == 0) 
  { 
      printf(“No PS3Eye cameras detected\n”); 
      return -1; 
  } 
  printf(“Found %d cameras\n”, numCams); 

  GUID guid = CLEyeGetCameraUUID(0); 
  printf(“Camera %d GUID: [x-x-x-xxxxxxxx]\n”, 
              1, guid.Data1, guid.Data2, guid.Data3, 
              guid.Data4[0], guid.Data4[1], guid.Data4[2], 
              guid.Data4[3], guid.Data4[4], guid.Data4[5], 
              guid.Data4[6], guid.Data4[7]); 
  pCam = new CLEyeCameraCapture(guid, CLEYE_COLOR, CLEYE_VGA, 60); // <= change here fps to 75 => I’m not getting more then 55 <= drop frames start happening
  pCam->StartCapture();
  Sleep(100000);
  pCam->StopCapture(); 
  delete pCam;
  return 0;
}

Profile
 
 
Posted: 16 January 2010 04:03 PM   [ Ignore ]   [ # 3 ]
Administrator
Avatar
RankRankRankRank
Total Posts:  585
Joined  2009-09-17

Here is the main loop as you have it:

while(_running)
{  
    m_pBuffer 
= new BYTE[((_resolution==CLEYE_QVGA)?(320*240):(640*480))*sizeof(RGBQUAD)];
    
Sleep(0);
    if(
CLEyeCameraGetFrame(_camm_pBuffer10))
    
{
        printf
("%d  %g\r"i++, i/((clock()-dwClock)/1000.));
        
LPBYTE pBuffer = new BYTE[((_resolution==CLEYE_QVGA)?(320*240):(640*480))*sizeof(RGBQUAD)];
        if(
pBuffer)
        
{
            CopyMemory
(pBufferm_pBuffer, ((_resolution==CLEYE_QVGA)?(320*240):(640*480))*sizeof(RGBQUAD));
            
delete [] pBuffer;
        
}
    }
    
if(m_pBuffer)
    
delete [] m_pBuffer;

Here are the issues:

1 .Why are you allocating/deallocating memory buffer on every frame (twice)?
2. Why do you have Sleep(0) in your main loop? This will limit your true capture rate.
3. Why are you waiting only 10ms for a frame? With this you are effectively polling for frame instead of waiting for it.
4. Why are you copying memory buffer to another one once you already have it in your first buffer?
5. Why do you have printf inside of your capture loop and call it print on every frame?
6. On Windows for better accuracy you should use GetTickCount() or QueryPerformanceCounter() instead of clock().

While all of these (besides Sleep(0)) could be the causes of your high CPU usage, the sleep will limit your capture frame rate to something like 55fps.

Please try to replace you main loop with this and let me know what’s your CPU usage now:

m_pBuffer = new BYTE[((_resolution==CLEYE_QVGA)?(320*240):(640*480))*sizeof(RGBQUAD)];
while(
_running)
{  
    
if(CLEyeCameraGetFrame(_camm_pBuffer1000))
    
{
        
// add elapsed time and
        // every 100th or so frame print
        // the average framerate using
        // printf("%d  %g\r", i++, i/((clock()-dwClock)/1000.));
        
        // Process your m_pBuffer here
    
}
}  
delete [] m_pBuffer

AlexP

Profile
 
 
Posted: 16 January 2010 04:29 PM   [ Ignore ]   [ # 4 ]
Jr. Member
RankRank
Total Posts:  49
Joined  2010-01-15

Alex,

All overhead I’m doing has nothing to do with main issues.
Even if I do remove that overhead: result is the same => 50% cpu at 60fps (see code below).

Now, to your question about why am I doing that:
What exactly do you mean? You mean your capturing thread would live alone? I added some minimum functionality that does actually nothing but just allocates and copies memory…


OK, for your convenience: this is the main loop:
...

m_pBuffer = new BYTE[((_resolution==CLEYE_QVGA)?(320*240):(640*480))*sizeof(RGBQUAD)];
        while(
_running)
        
{  
            
if(CLEyeCameraGetFrame(_camm_pBuffer))
            
{
                i
++;
                if(!(
i0))   // here should be mod(100) I have trouble entering it in HTML
                    
printf("%d  %g\r"ii/((clock()-dwClock)/1000.));
            
}
        }  
        
if(m_pBuffer)
            
delete [] m_pBuffer;

... 

That one gives me 50% CPU and 59.5fps when capturing at 60fps and 40% CPU and just 51fps when capturing at 75fps.

Profile
 
 
Posted: 20 January 2010 08:53 AM   [ Ignore ]   [ # 5 ]
New Member
Rank
Total Posts:  4
Joined  2010-01-19

Igor,

This is a Windows bit.  If you do not use a message loop.  This code will always eat the full CPU.  In your case 1 full core of your dual core setup. 

Just replace this with a classic windows message queue that is looking something like…you must have the WaitMessage().  If you are using a DirectShow pipeline, I think a message is sent on new frame etc.  But any standard while loop will eat 50% of you cpu unless you have a WaitMessage or similar call.

while (PeekMessage())
{

  TranslateMessage(aMsgPtr);
  DispatchMessage(aMsgPtr);


  if (isRunning)
      WaitMessage(); // this is the magic trick to not kill the cpu!!!

}

Matt

Profile
 
 
Posted: 20 January 2010 11:18 AM   [ Ignore ]   [ # 6 ]
Jr. Member
RankRank
Total Posts:  49
Joined  2010-01-15

tribalmatt,

Sorry, but your message and solution, while valid for windows programs that do have message pumps, has nothing to do with thread topic here.
Sample program, that provided with SDK and which is point of discussion here doesn’t have any message pumps.
CLEyeCameraGetFrame() function should be properly yielding to the system until next fram arrives.
So, there are 2 questions to be answered:
1. Is CLEyeCameraGetFrame properly yielding to the system and doesn’t “eat” any CPU while waiting for next frame;
2. If (1) is positive: does it processes next frame effectively enough not to “eat” as much CPU.

Thats the point of this thread.

Profile
 
 
Posted: 02 February 2010 10:27 PM   [ Ignore ]   [ # 7 ]
Administrator
Avatar
Rank
Total Posts:  8
Joined  2009-09-17
igor1960 - 16 January 2010 04:29 PM

Alex,

All overhead I’m doing has nothing to do with main issues.

You were allocating memory twice. Even if we assume that’s a cheap operation (and it may not be as cheap as you suspect it is) there was a ton of unnecessary memory copying in your code, not to mention the effective conversion to polling, and the yield, which will cause unnecessary trips to kernel mode.

igor1960 - 16 January 2010 04:29 PM
m_pBuffer = new BYTE[((_resolution==CLEYE_QVGA)?(320*240):(640*480))*sizeof(RGBQUAD)];
        while(
_running)
        
{  
            
if(CLEyeCameraGetFrame(_camm_pBuffer))
            
{
                i
++;
                if(!(
i0))   // here should be mod(100) I have trouble entering it in HTML
                    
printf("%d  %g\r"ii/((clock()-dwClock)/1000.));
            
}
        }  
        
if(m_pBuffer)
            
delete [] m_pBuffer;

... 

That one gives me 50% CPU and 59.5fps when capturing at 60fps and 40% CPU and just 51fps when capturing at 75fps.

Alright. So now, we’ve eliminated a number of things, and we can try to get to the root of the high CPU usage. What is the CPU that you are running this test application on? Also, are you running debug or release builds? Are you running it inside the debugger?

Nik

Profile
 
 
Posted: 02 February 2010 10:50 PM   [ Ignore ]   [ # 8 ]
Jr. Member
RankRank
Total Posts:  49
Joined  2010-01-15

Looks like latest version works utilizing less CPU. However, it is still high.
On Dual Core 2.8mhz Genuine Intel x86 Family 15 Stepping 4
Both Release and Debug produces same result.

60FPS around 40%CPU, both COREs involved now.

Also setting 75FPS never gets reached now: capturing at 75FPS in reality produces <60fps…

If we run my sample code with elevated priorities: look at my code above: you can see commented out:
...
SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
in main and
SetThreadPriority(GetCurrentThread(), REALTIME_PRIORITY_CLASS);
in CaptureThread.

So, if you uncomment them now, and run the program then when I request 75fps I’m really getting 75FPS, but
the price is ~60% CPU on both cores with one core reaching almost full capacity.

Profile
 
 
 
 


RSS 2.0     Atom Feed