This blog post was written based on my work done during developing Looking Glass project. The first thing you have to do when you want to create augmented reality application is to get real-time feed of world in front of you. If you have a dedicated device like Epson Moverio you are good to go because there is just pair of semipermeable classes which you can see through. But if you want to make augmented reality with Google Cardboard or similar device, then you have a big problem. Since you can not see through you phone (or for now at least), you have to utilize you phones camera. Basically you take stream from your phone’s camera, do some lens distortion correction and then show it on the screen of your phone. The phone I use for this experiment is Nexus 5 which has quite some limitations. The sad thing is that despite many promises from several years ago the camera is still not 3D and can only do 30fps. So I am stuck with 2D camera and 30fps which despite my doubts works quite well for augmented reality app.
Much, much bigger problem is latency. Humans can very easily spot any latency problem, which you can test yourself by playing any fast computer game on not so fast computer. For virtual reality which which is augmenting the real world in Looking Glass, the maximum tolerable latency is about 20 milliseconds (for comparison your normal screen refreshes every 16 milliseconds ). So in this time I have to catch a frame, dig it through about 10 software layers (thanks Android for extra Java overhead grrr ), some hardware, bunch of buffers and then show it on screen to user. With with current state of cell phone technology is mission impossible, but I have tried to find the fastest way.
In order to find the fastest solution I have developed four implementations with identical purpose to show real-time feed from camera on screen. The implementations are done in C++ and Qt framework together with same Java glue. The first two implementation are just for compaction to know what is the base speed of the system. The other two are made to gain maximum performance and I also provide source code for them.
- Implementation: Standard implementation with Android API and plain Java. Simply put the the preview from camera to the app layout.
- Implementation: Since Qt still does not provide and access to camera through QCamera API I have used Camera from QtQuick.
- Implementation: This implementation grabs the frame from camera through the Android’s Camera.PreviewCallback. Frames provided through this API are unfortunately in NV21 colour format, so I copy the whole frame through JNI to C++ where I do the conversion to RGB. Then I upload this RGB image to OpenGL texture which is rendered on screen.
- Implementation: Last implementation is the most interesting one. Since I am rendering everything in OpenGL anyway it would be great if I could get the OpenGL texture directly. Well Android has a function just for that. It is an OpenGL extension called OES_EGL_image_external. This extension creates a texture with type GL_TEXTURE_EXTERNAL_OES which has some special features. But mostly works like any other GL_TEXTURE_2D, you just need to rewrite your fragment shaders to support it. Thankfully Android’s implementation of OpenGL is basically just wrapper over underlying C implementation. This allows me to create the OpenGL texture in NDK with C, pass the texture id through JNI to Java with current OpenGL context and the let the Java to feed the frames to the texture.
For the testing I have used GoPro camera with 240fps recording and Nexus 5 with CyanogenMod 11. The phone was pointed on small LED and GoPro was pointed on both LED and phones screen. After each LED turn on, I have counted the number of frames between the event itself and its appearance on the phones screen. The number of frame is written in the table below. From several identical experiments I have counted average number of frames and from that an approximate latency (each frame is cca 4.2ms).
|Implementation 1||Implementation 2||Implementation 3||Implementation 4|
|Average||51.6 frames||42.8 frames||33.0 frames||45.6 frames|
|Latency||cca 216ms||cca 179ms||cca 191ms||cca 138ms|
From these results you can clearly see that today’s phones with Android are rely not good for augmented reality due to massive latency. First implementation clearly shows that Android own implementation of camera preview is not any fast, probably due to Java’s overhead and software image rendering. The speed of implementations 2 and 3 is similar since from reading of Qt’s source code I found that Digia’s implementation does pretty much the same operations as mine does. The last one is the best one I found. I tried to find why but I was not able to dig deep enough through the Android’s source to find actual cause. Because it heavily depends on camera’s drivers which are closed source. For the Looking Glass project I have chosen the implementation number four and even through it is not perfect, it is good enough for the augmented reality. When I get to the laboratory I am planing to re-do this experiment with more samples (current ones vary quite a lot) and also test the Sailfish OS and Ubuntu Phone.