Banuba offers a unique partnership opportunity for up to 10 app developers across the world to create AR-enabled products.
The partners will receive:
- Access to Banuba’s mobile AR SDK
- Marketing and business development support
- Funding for a joint venture as needed
Among the first partners to take up the offer is game developer Inventain. Their team has received €1 million to develop their project jointly with us.
This initiative is backed by major investors as part of an international programme to develop AI and AR technologies.
Introducing Banuba technology
The Banuba AR SDK is a unique product for mobile app developers, which combines several advanced technologies and is suitable even for constrained devices.
such as 3D face detection and tracking, 3D face analysis, including detection of skin and hair colour, hair style, and separation of objects from the background.
High app performance is achieved thanks to:
– A 3D math model integrated in the head solution, which allows to omit 2D points identification;
– Datasets tuned to work with Banuba algorithms;
– Optimisation for specific architectures, namely Apple A9, A10, A11 CPUs and Android.
Unlike other solutions, in which filters are created by identifying 2D points on the face, prior to create a 3D model of the head via nonlinear equations Banuba technology establishes a 3D model of the head directly, skipping the identification of 2D points.
The 3D math model of the head (Face Kernel ™), developed by Banuba, reduces all of its possible transformations to a limited number of variables.
Meanwhile, direct creation of the 3D model leads to a higher degree of precision as it makes the complicated calculations that occur during the 2D-to-3D transformation unnecessary.
Creating a suitable data set is a vast challenge for the recognition of face structure. Some tasks related to face segmentation are not structured, remaining very subjective for the developer of a data set. For instance, there is no consistent classification of hairstyles.
To create such data sets, Banuba uses semi-supervised metric learning and generative adversarial networks (GANs). Information from the tracker is used as a with the help of a neural network.
Overall, Banuba prefers hand-crafted, hand-tuned code to compilers, relying on human ability where it is necessary.
In machine learning, some say, jokingly, that a data set is more important than the algorithm. But this is perhaps not just a joke.
Banuba doesn’t use pre-existing data sets – or, to be exact, Banuba uses them only as a part of its own data sets.
On the one hand, this is because some of the required data sets simply do not exist. On the other hand, a correctly constructed data set, if tuned to the learning algorithm, can significantly reduce the learning error.
To create data sets, Banuba uses semi-supervising learning, a constraint form of active learning and domain adaptation methods.
Today, unfortunately, or perhaps fortunately, taking a data set, using your preferred machine learning method and compiling the result with the “-O3” key is no longer enough to develop a state-of-the-art solution.
Banuba has designed in-house math models to significantly simplify the effort This cuts the execution time on a smartphone, reduces the learning time for the algorithm, and allows the use of a larger data set, which in turn will improve the quality of the operation.
Banuba uses a rather unconventional form of deep learning, mixing CNNs and different variations of Random Forests.
In addition, Banuba has developed unconventional types of neural network layers, tuned to specific architectures, namely Apple A9, A10, A11 CPUs and Android.
All these features allow the Banuba AR SDK to be supported even by constrained devices, providing effective and fast performance on 90% of smartphones. Last but not least, the Banuba AR SDK will not drain the battery like many other battery consuming applications. This significantly enhances the experience of those who use applications built with Banuba AR SDK.
Below is a description of the Banuba AR SDK, its main functionalities, features and possible applications.
Face AR SDK
Face AR SDK incorporates a mix of technologies, including face detection and tracking, eye gaze detection, estimation of distance from the smartphone, recognition of skin and hair colour and hair style, and emotion recognition, the synergy of which creates substantial user benefits for various applications.
1. 3D FACE MOTION TRACKING
1.1. Face detection and head pose tracking
This technology allows for the detection of frontal face. Once the face has been detected, the algorithm switches to head-pose tracking mode by using the position of the head on the previous image as the initial approximant. If the face is lost, the algorithm switches back to face detection mode.
Based on a directly inferred 3D model of the head (rather than one transformed from a 2D model), the technology is also operable even with a low SNR (signal to noise ratio) and poor lighting conditions.
The incorporated model can forecast the appearance of the head in the subsequent frame, increasing stability and precision.
The size of the data set is 300,000 faces.
- High performance (60 fps)
- High quality
- Extreme angles, ranging from –90 to +90
- Efficient operation in poor lighting conditions
- Operation with occlusions of up to 50% of the face
- Stable detectionand resistance to partial occlusions of the face, including glasses and complex haircuts
- Depending upon needs, a 3D model of a face with 64 to 3,308 peaks is created
- Supports 360 degrees rotation of the smartphone camera
- Estimation of distance from the smartphone
1.2. Eye tracking and gaze detection
Thanks to this technology, it is possible to not just to “track” a person’s gaze, but also to control a smartphone’s function with a gaze. To that end, an algorithm is used that detects micro-movements of the eye with subpixel accuracy in real-time, it also detects certain points of the eye. Based on that data, a vector of movement can be created.
Banuba’s face recognition algorithm helps to measure the distance to various points on a scanned surface with a high degree of precision and to detect its shape. It can detect, for instance, whether the user’s eyes are open or closed.
- High degree of precision
- Eye pupil detection and tracking
- Eye states: open & closed
- Eye blinking
- Attention tracking
Facial motion capture
The technology of Face Motion Capture is based on the scanning of the movements of a person’s face and converting them to computer animation for movies, games, or real-time avatars. It can operate either in the real-time or based on the user’s preliminarily saved data or/and their face motion models. Derived from the movements of real people, the technology results in more realistic and nuanced computer character animation than if the animation were created manually.
- Fast execution – the model can operate in the real-time based on at least one or several frames. However, each consequent frame may improve the model.
- Can be integrated into the Face Recognition pipeline in order to develop precise models of users combining both visual similarity and the resemblance of facial gestures, emotions and other motion-related features.
3D face motion capture applications
- Face recognition (biometrics)
- Zone of autofocus (photography)
- Unconventional elements of the user interface, games
- Measuring user interaction and engagement with in-app ads
- Behavioural analytics of mobile UI/UX bases on eye gaze tracking
- Work style analysis: eyes blinking frequency, analysis of employee attention concentration, estimation of employee type of activity: typing, reading, calculation of onscreen time spent,
2. 3D FACE ANALYSIS
2.1 Face segmentation
Face segmentation is a specific computer vision task aimed at correctly assigning labels of face regions, such as nose, mouth, eye, hair, etc., to each pixel in a face image. Our face segmentation techniques include complex cascaded machine learning algorithms in combination with colour model and Monte Carlo approaches, which lead to the precise detection of eyes, their structure (iris, pupil, eyeball, etc.) and also the nose, ear, cheek, chin, mouth, lip eyebrow and forehead areas.
- Access over a convenient API
- Separate parts of face can be detected for further analysis
2.2 Evaluation of anthropometric parameters
Face detection and face tracking algorithms, make it possible to recognise more refined facial patterns and recreate its semantics, mimics and anthropometrics, to analyse all of its main parts and basically, “understand” it.
Facial anthropometry refers to the measurement of the facial individual features. Anthropometry involves the systematic measurement of the dimensional descriptors of facial size and shape and its geometrical properties.
Our novel algorithm automatically detects a set of anthropometric facial fiducial points that are associated with these features.
Features and applications
- Access over a convenient API
- Reconstruction of face geometry, “cleaned” of mimics
- Creation of a person’s caricature avatars on the fly
2.3 Skin and hair colour
Banuba has developed a library for detecting hair and skin colour for iOS.
Detection of hair colour is based on the estimation of geometrical and colour-space characteristics of the human face. At the first stage, an image is pre-processed: face is detected and scaled by the interpupillary distance, the eye line is properly aligned. As a result, the image of a person’s face is normalised and aligned.
The detected area above the person’s forehead is used for hair colour detection as it is less sensitive to the head’s turns than other possible areas of hair. Hair colour is detected by finding sharp intensity patterns in the region of interest and analysing colour features above the hairline. There will be no such sharp intensity patterns for bald individuals, hence our technology allows us to detect the lack of hair as well.
For the skin colour detector, colour samples are taken in the specific points of a face which preliminarily detected by other algorithms (based on our face tracking technology) as belonging to the face skin.
- Precise colour measurement
- Visual skin colour and skin tone correction
- Skin-related disorder detection
- Virtual makeup
2.4 Hair style detection
This technology is based on convolutional neural networks. A learning sample selection consists of images of men and women, broken down by their hairstyle. Learning is the process of matching an image with the most relevant hairstyle sample.
Semi-supervised metrics learning was used for the creation of the data set.
Subsequently, a GAN was trained for the expansion of the data set, upon which, the final network was trained. To improve quality, a loss function, specifically selected for the task, was used.
- Detecting hair style, hair colour
- Changing hair colour
- Haircuts of any shape can be detected, regardless of hair length
- The algorithm distinguishes with pixel precision between hair, face and other parts, such as beard, moustache or glasses
- Matting is done with pixel precision. If the background is seen through the hair, it is detected as part of hair.
- Virtual hair salon: trying out new hair colours and hair styles
- Detecting hair style for 3D avatar creation
2.5 Emotion and expression recognition
The underlying idea for this technology is that people’s emotions are often reflected on their faces, and the computer’s ability to “read a face” would make it possible to deliver more personalised electronic content.
The variables of the model returned by the recognition algorithm can be used either directly, or transformed into parameters for another model, such as Facial Action Coding System (FACS), which detects muscle movements that correspond to specific emotions.
Our technology allows us to detect six basic emotions: anger, disgust, fear, happiness, sadness, and surprise.
- Real-time detection of anger, disgust, fear, happiness, sadness, and surprise
- Access to a convenient API
- Conversion of data to Paul Ekman‘s Facial Action Coding System
- Mood-related content in mobile applications, such as three-dimensional visual masks that reflect users’ feelings while they are communicating in mobile video chats
- Targeted advertisement
- Detection of tiredness and degree of stress
- Use of emotional reactions for a product or content in empathic apps and advertisements
- Creation of human-friendly digital products which react to human mimics
- Detection of emotional states of patients for health purposes (helping schizophrenic patients)
2.6 Face Beautification
This technology is used for creating visually beautified images of users based on makeup standards. Anthropometric data and mimics are analysed and corrected in the real time.
- Smoothing of skin
- Correction of face skin tone
- Whitening of eyes and teeth
- Correction of face shape (make it slimmer, wider, increase/decrease eye size, change shape of the nose and head proportions)
- Changing hair colour
- Improve face symmetry
- Shape and colour of eyebrows
- Correction of lips’ shape
- Virtual make-up
- Face morphing
- Improving the user’s look during video chat
- Plastic surgery: visual aids of the future surgery outcome during consultation with a patient
- Cosmetics: application of makeup / choosing what makeup product fits best
- Demoing effects from application of face skin products
- Fixing smartphone camera distortions
- Make selfies look more visually attractive
3. Separating user image from the background
This technology can be used for the real-time replacement of backgrounds with both static and animated textures. Backgrounds can be changed into a number of optional presets during video calls, or used as a funny effect in advertising.
Banuba has developed a library to separate a person’s image from the background for Apple iOS.
The technology is based on convolutional neural networks, with colour images as the input and a probability mask showing whether a pixel belongs to the class “person” or to the class “background.” Such an approach allows us to ensure high performance and good results for the real-time backgrounds.
The problem of lack of data sets for separation of objects from the background is resolved by creation of a small initial data set, which is increased by active learning and subsequent fine-tuning.
A correctly selected data set helps to obtain optimal results and high-quality implementation.
- Augmentation of raw output with classic algorithms for computer vision and signal filtration – lightweight post processing (deletion of small contours, precise definition of borders, additional image recognition etc.).
- Post-processing, including mating, filters and removal of small artefacts
- Portrait mode and bokeh effect
- Replacement of an unsuitable background, noise removal
- Animation effects in the background that are changed by the user as part of interactivity
- Animated emotion-related background
- Adding fancy colours
- Protection of privacy
- 360-degree background 2D and 3D for educational purposes, e.g., for mixed reality
- “Hollywood effects” on a mobile phone
- Replacing backgrounds for practical purposes (e.g. during a business call) or to entertain (e.g. jungle instead of a wall)
- Editing “boring” backgrounds to create perfect videos
- Removing unwanted objects or people from videos