Banuba offers cutting-edge AR technology for mobile apps

Banuba offers a unique partnership opportunity for up to 10 app developers across the world to create AR-enabled products.

The partners will receive:

  • Access to Banuba’s mobile AR SDK
  • Marketing and business development support
  • Funding for a joint venture as needed

Among the first partners to take up the offer is game developer Inventain. Their team has received €1 million to develop their project jointly with us.

This initiative is backed by major investors as part of an international programme to develop AI and AR technologies.

Introducing Banuba technology

The Banuba AR SDK is a unique product for mobile app developers. It is suitable even for constrained devices and combines several advanced technologies. Banuba features 3D face detection and tracking, 3D face analysis, including detection of skin and hair color, hair style, and the separation of objects from the background.

High app performance is achieved thanks to:

  • A 3D math model integrated in the head solution, which allows to omit 2D points identification;
  • Datasets tuned to work with Banuba algorithms;
  • Optimisation for specific architectures, namely Apple A9, A10, A11 CPUs and Android.

Other solutions create filters by identifying 2D points on the face first. And only then do they create a 3D model of the head  via nonlinear equations. Banuba technology works differently. It establishes a 3D model of the head directly, skipping the identification of 2D points. That is why it's more accurate on the outcome.

The 3D math model of the head (Face Kernel ™), developed by Banuba, reduces all of its possible transformations to a limited number of variables.

Meanwhile, direct creation of the 3D model leads to a higher degree of precision as it makes the complicated calculations that occur during the 2D-to-3D transformation unnecessary.

Creating a suitable data set is a vast challenge for the recognition of face structure. Some tasks related to face segmentation are not structured, remaining very subjective for the developer of a data set. For instance, there is no consistent classification of hairstyles.

To create such data sets, Banuba uses semi-supervised metric learning and generative adversarial networks (GANs). Information from the tracker is used as a with the help of a neural network.

Overall, Banuba prefers hand-crafted, hand-tuned code to compilers, relying on human ability where it is necessary.

In machine learning, some say, jokingly, that a data set is more important than the algorithm. But this is perhaps not just a joke.

Banuba doesn’t use pre-existing data sets – or, to be exact, Banuba uses them only as a part of its own data sets.

On the one hand, this is because some of the required data sets simply do not exist. On the other hand, a correctly constructed data set, if tuned to the learning algorithm, can significantly reduce the learning error.

To create data sets, Banuba uses semi-supervising learning, a constraint form of active learning and domain adaptation methods.

Today, unfortunately, or perhaps fortunately, taking a data set, using your preferred machine learning method and compiling the result with the “-O3” key is no longer enough to develop a state-of-the-art solution.

Banuba has designed in-house math models to significantly simplify the effort. This cuts the execution time on a smartphone, reduces the learning time for the algorithm, and allows the use of a larger data set, which in turn will improve the quality of the operation.

Banuba uses a rather unconventional form of deep learning, mixing CNNs and different variations of Random Forests.

In addition, Banuba has developed unconventional types of neural network layers, tuned to specific architectures, namely Apple A9, A10, A11 CPUs and Android.

All these features allow the Banuba AR SDK to be supported even by constrained devices, providing effective and fast performance on 90% of smartphones. Last but not least, the Banuba AR SDK will not drain the battery like many other battery consuming applications. This significantly enhances the experience of those who use applications built with Banuba AR SDK.

Below is a description of the Banuba AR SDK, its main functionalities, features and possible applications.


Face AR SDK incorporates a mix of technologies, including face detection and tracking, eye gaze detection, estimation of distance from the smartphone, recognition of skin and hair color and hair style, and emotion recognition, the synergy of which creates substantial user benefits for various applications.


1.1. Face detection and head pose tracking

This technology allows for the detection of frontal face. Once the face has been detected, the algorithm switches to head-pose tracking mode by using the position of the head on the previous image as the initial approximant. If the face is lost, the algorithm switches back to face detection mode.

Based on a directly inferred 3D model of the head (rather than one transformed from a 2D model), the technology is also operable even with a low SNR (signal to noise ratio) and poor lighting conditions.

The incorporated model can forecast the appearance of the head in the subsequent frame, increasing stability and precision.

The size of the data set is 300,000 faces.


  • High performance (60 fps)
  • High quality
  • Extreme angles, ranging from –90 to +90
  • Efficient operation in poor lighting conditions
  • Operation with occlusions of up to 50% of the face
  • Stable detection and resistance to partial occlusions of the face, including glasses and complex haircuts
  • Depending upon needs, a 3D model of a face with 64 to 3,308 peaks is created
  • Supports 360 degrees rotation of the smartphone camera
  • Estimation of distance from the smartphone

1.2. Eye tracking and gaze detection

Thanks to this technology, it is possible to not just to “track” a person’s gaze, but also to control a smartphone’s function with a gaze. To that end, an algorithm is used that detects micro-movements of the eye with subpixel accuracy in real-time, it also detects certain points of the eye. Based on that data, a vector of movement can be created.

Banuba’s face recognition algorithm helps to measure the distance to various points on a scanned surface with a high degree of precision and to detect its shape. It can detect, for instance, whether the user’s eyes are open or closed.


  • High degree of precision
  • Eye pupil detection and tracking
  • Eye states: open & closed
  • Eye blinking
  • Attention tracking

Facial motion capture

The technology of Face Motion Capture is based on the scanning of the movements of a person’s face and converting them to computer animation for movies, games, or real-time avatars. It can operate either in the real-time or based on the user’s preliminarily saved data or/and their face motion models. Derived from the movements of real people, the technology results in more realistic and nuanced computer character animation than if the animation were created manually.


  • Fast execution – the model can operate in the real-time based on at least one or several frames. However, each consequent frame may improve the model.
  • Can be integrated into the Face Recognition pipeline in order to develop precise models of users combining both visual similarity and the resemblance of facial gestures, emotions and other motion-related features.

3D face motion capture applications

  • Face recognition (biometrics)
  • Zone of autofocus (photography)
  • Unconventional elements of the user interface, games
  • Measuring user interaction and engagement with in-app ads
  • Behavioral analytics of mobile UI/UX based on eye gaze tracking
  • Work style analysis: eyes blinking frequency, analysis of employee attention concentration, estimation of employee type of activity: typing, reading, calculation of onscreen time spent.


2.1 Face segmentation

Face segmentation is a specific computer vision task aimed at correctly assigning labels of face regions, such as nose, mouth, eye, hair, etc., to each pixel in a face image. Our face segmentation techniques include complex cascaded machine learning algorithms in combination with color model and Monte Carlo approaches, which lead to the precise detection of eyes, their structure (iris, pupil, eyeball, etc.) and also the nose, ear, cheek, chin, mouth, lip eyebrow and forehead areas.


  •  Access over a convenient API
  • Separate parts of face can be detected for further analysis

2.2 Evaluation of anthropometric parameters

Face detection and face tracking algorithms, make it possible to recognize more refined facial patterns and recreate its semantics, mimics and anthropometrics, to analyze all of its main parts and basically, “understand” it.

Facial anthropometry refers to the measurement of the facial individual features. Anthropometry involves the systematic measurement of the dimensional descriptors of facial size and shape and its geometrical properties.

Our novel algorithm automatically detects a set of anthropometric facial fiducial points that are associated with these features.

Features and applications

  • Access over a convenient API
  • Reconstruction of face geometry “cleaned” of mimics
  • Creation of a person’s caricature avatars on the fly

2.3 Skin and hair color

Banuba has developed a library for detecting hair and skin color for iOS.

Detection of hair color is based on the estimation of geometrical and color-space characteristics of the human face. At the first stage, an image is pre-processed: the face is detected and scaled by the interpupillary distance, the eye line is properly aligned. As a result, the image of a person’s face is normalized and aligned.

The detected area above the person’s forehead is used for hair color detection as it is less sensitive to the head’s turns than other possible areas of hair. Hair color is detected by finding sharp intensity patterns in the region of interest and analyzing color features above the hairline. There will be no such sharp intensity patterns for bald individuals, hence our technology allows us to detect the lack of hair as well.

For the skin color detector, color samples are taken in the specific points of a face which preliminarily detected by other algorithms (based on our face tracking technology) as belonging to the face skin.


  • Precise color measurement


  • Visual skin color and skin tone correction
  • Skin-related disorder detection
  • Virtual makeup

2.4 Hair style detection

This technology is based on convolutional neural networks. A learning sample selection consists of images of men and women, broken down by their hairstyle. Learning is the process of matching an image with the most relevant hairstyle sample.

Semi-supervised metrics learning was used for the creation of the data set.

Subsequently, a GAN was trained for the expansion of the data set, upon which, the final network was trained. To improve quality, a loss function, specifically selected for the task, was used.


  • Detecting hair style, hair color
  • Changing hair color
  • Haircuts of any shape can be detected, regardless of hair length
  • The algorithm distinguishes with pixel precision between hair, face and other parts, such as beard, mustache or glasses
  • Matting is done with pixel precision. If the background is seen through the hair, it is detected as part of the hair.


  • Virtual hair salon: trying out new hair colors and hair styles
  • Detecting hair style for 3D avatar creation

2.5 Emotion and expression recognition

The underlying idea for this technology is that people’s emotions are often reflected on their faces, and the computer’s ability to “read a face” would make it possible to deliver more personalized electronic content.

The variables of the model returned by the recognition algorithm can be used either directly, or transformed into parameters for another model, such as Facial Action Coding System (FACS), which detects muscle movements that correspond to specific emotions.

Our technology allows us to detect six basic emotions: anger, disgust, fear, happiness, sadness, and surprise.



  • Mood-related content in mobile applications, such as three-dimensional visual masks that reflect users’ feelings while they are communicating in mobile video chats
  • Targeted advertisement
  • Detection of tiredness and degree of stress
  • Use of emotional reactions to a product or content in empathic apps and advertisements
  • Creation of human-friendly digital products which react to human mimics
  • Detection of emotional states of patients for health purposes (helping schizophrenic patients)

2.6 Face Beautification

This technology is used for creating visually beautified images of users based on makeup standards. Anthropometric data and mimics are analyzed and corrected in the real time.


  • Smoothing of skin
  • Correction of face skin tone
  • Whitening of eyes and teeth
  • Correction of face shape (make it slimmer, wider, increase/decrease eye size, change the shape of the nose and head proportions)
  • Changing hair color
  • Improve face symmetry
  • Shape and color of eyebrows
  • Correction of lips’ shape
  • Virtual makeup
  • Face morphing


  • Improving the user’s look during video chat
  • Plastic surgery: visual aids of the future surgery outcome during consultation with a patient
  • Cosmetics: application of makeup / choosing what makeup product fits best
  • Demoing effects from the application of face skin products
  • Fixing smartphone camera distortions
  • Make selfies look more visually attractive


3. Separating user image from the background

This technology can be used for the real-time replacement of backgrounds with both static and animated textures. Backgrounds can be changed into a number of optional presets during video calls, or used as a funny effect in advertising.

Banuba has developed a library to separate a person’s image from the background for Apple iOS.

The technology is based on convolutional neural networks, with color images as the input and a probability mask showing whether a pixel belongs to the class “person” or to the class “background.” Such an approach allows us to ensure high performance and good results for the real-time backgrounds.

The problem of lack of data sets for separation of objects from the background is resolved by the creation of a small initial data set, which is increased by active learning and subsequent fine-tuning.

A correctly selected data set helps to obtain optimal results and high-quality implementation.


  • Augmentation of raw output with classic algorithms for computer vision and signal filtration – lightweight post processing (deletion of small contours, precise definition of borders, additional image recognition etc.).
  • Post-processing, including mating, filters and removal of small artifacts
  • Portrait mode and bokeh effect


  • Replacement of an unsuitable background, noise removal
  • Animation effects in the background that are changed by the user as part of interactivity
  • Animated emotion-related background
  • Advertisement
  • Adding fancy colors
  • Protection of privacy
  • 360-degree background 2D and 3D for educational purposes, e.g., for mixed reality
  • “Hollywood effects” on a mobile phone
  • Replacing backgrounds for practical purposes (e.g. during a business call) or to entertain (e.g. jungle instead of a wall)
  • Editing “boring” backgrounds to create perfect videos
  • Removing unwanted objects or people from videos

want to work with us?

Just write your email and we will answer you

Please fill in the required field
Please use correct Email