My Own AI - Ximilar: Visual AI for Business

The Best Tools for Machine Learning Model Serving

Michal Lukáč — Wed, 25 Oct 2023 09:26:42 +0000

As the prevalence of AI in various industries increases, so does the need to optimize the machine learning model serving. As a machine learning engineer, I’ve seen that training models is just one part of the ML journey. Equally important as the other challenges is the careful selection of deployment strategies and serving systems.

In this article, we’ll delve into the importance of selecting the right tools for machine learning model serving, and talk about their pros and cons. We’ll explore various deployment options, serving systems like TensorFlow Serving, TorchServe, Triton, Ray Serve, and MLflow, and also the deployment of specific models such as large language models (LLMs). I’ll also provide some thoughts and recommendations for navigating this ever-evolving landscape.

Machine Learning Models Serving Then and Now

When I first began my journey in the world of machine learning, the landscape was constantly shifting. The frameworks being actively developed and used at the time included Caffee, Theano, TensorFlow (Google) and PyTorch (Meta), all vying for their place in the world of AI. As time has passed, the competition has become more and more lopsided, with TensorFlow and PyTorch leading the way. While TensorFlow has remained the more popular choice for production-ready models, PyTorch has been steadily gaining in popularity, particularly within research circles, for its faster, more intuitive prototyping capabilities.

While there are hundreds of libraries available to train and optimize models, the most popular frameworks such as TensorFlow, PyTorch and Scikit-Learn are all based on Python programming language. Python is often chosen due to its simplicity and the vast amount of libraries for data manipulation. However, it is not the fastest language and can present problems with parallel processing, threads and GIL. Additionally, specialized libraries such as spaCy and PyG are available for specific tasks, such as Natural Language Processing (NLP) and Graph Analysis, respectively. The focus was and still partially is on the optimization of models and architectures. On the other hand, there are more and more problems in machine learning models serving in production because of the large-scale adoption of AI.

Nowadays, even more complex models like large language models (LLM, GPT/LAMMA/BARD) and multi-modal models are in fashion which creates a bigger pressure on optimal model deployment, infrastructure environment and storage capacity. Making machine learning model serving and deployment effective and cheap is a big problem. Even companies like Microsoft or NVIDIA are actively working on solutions that will cut the costs of it. So let’s look into some of the best options that we as developers currently have.

The Machine Learning and DevOps Challenges

Being a Machine Learning Engineer, I can say that training a model is just a small part of the whole lifecycle. Data preparation, deployment process and running the model smoothly for numerous customers is a daily challenge and a major part of the job.

Deployment Strategies

In addition to having to allocate GPU/CPU resources and manage inference speed, the company deploying ML models must also consider the deployment strategy for the trained model. You could be deploying the ML model as an API, running it in a container, or using a serverless platform. Each of these options comes with its own set of benefits and drawbacks, so carefully considering the best approach is essential. When we have a trained model, there are several options on how to use it:

Deploy it as an API endpoint, sending data in the request and getting results immediately in response. This approach is suitable for faster models that are able to process the data in just a few seconds.
Deploy it as an API endpoint, but return just a promise or asynchronous response from the model. This is great for computational-intensive models that can take minutes or hours of processing. For example, generative models and upscaling models are slow and require this approach.
Use a system that is able to serve it for you.
Use the model locally on your data.
Deploy models on Smartphones or IoT devices with feed from local sensors.

Other Challenges

The complexity of machine learning projects grows with variables such as:

The number of models – It is common practice to use multiple models. For example, at this moment, there are tens of thousands of different ML models on the Ximilar platform.
Model versions – You can train each of your models on different training data (part of the dataset) and mark it as a different version. Model versioning is great if you want to A/B test your ML model, tune your model performance, and for continuous model training.
Format of models – You can potentially train and save your ML models in various formats. For instance, .h5 which is a Keras/TensorFlow format or .pt (PyTorch) or .onnx for ONNX Runtime. Usually, each framework supports only specific formats.
The number of frameworks – Served ML models could be trained with different frameworks and their versions.
The number of the nodes (servers) – Models can be hosted on one or multiple servers and the serving system should be able to intelligently load balance the requests on servers so that none of them is throttled.
Models storage/registry – You need to store the ML models in some database or storage, such as AWS S3 or local storage
Speed/performance – The loading time of models from the storage can be critical and can cause a slow inference per sample.
Easy to use – Calling model via Rest API or gRPC requests, single-or-batch inference.
Hardware specification – ML models can be deployed on Edge devices or PCs with various architectures.
GPUs vs CPUs and libraries – Some models must be used only on CPUs and some require a GPU card.

Our Approach to the Machine Learning Model Serving

Several systems were developed to tackle these problems. Serving and deploying machine learning models has come a long way since we founded Ximilar in 2016. Back then, no system was capable of effectively serving hundreds of neural networks for inference.

So, we decided to build our own system for machine learning model serving, and today it forms the backbone of our machine-learning platform. As the use of AI becomes more widespread in companies, newer systems such as TensorFlow Serving emerge quickly to meet the increasing demand.

Which Framework Is The Best?

The Battle of Machine Learning Frameworks

Nowadays, each big tech company has their own solution for machine learning model serving and training. To name a few, PyTorch (TorchServe) and AITemplate by META (Facebook), TensorFlow (TFServing) by Google, ONNX runtime by Microsoft, Triton by NVIDIA, Multi-Model-Server by Amazon and many others like BentoML or Ray.

There are also tens of formats that you can save your ML model in, just TensorFlow alone is able to save into .h5, .pb, saved_model or .tflite formats, each of them serving a different purpose. For example, TensorFlow Lite is great for smartphones. It also loads very fast, so the availability of the model is great. However, it supports only limited operations and more modern architectures cannot be converted with it.

Machine learning model serving: each big tech company has their own solution for training and serving machine learning models.

You can also try to convert models from PyTorch or TensorFlow to TensorRT and OpenVino formats. The conversion usually works with basic and most-used architectures. The TensorRT is great if you are deploying ML models on Jetson Nano or Xavier. You can achieve a boost in performance on Intel servers via OpenVino conversion or the Neural Magic library.

The ONNX Format

One notable thing is the ONNX format. The ONNX is not a library for training your machine learning models, ONNX is an open format for storing machine learning models. After the model training, for example, in TensorFlow, you can convert it to ONNX format. You are able to run this converted model via ONNX runtime on almost any platform, programming language, CPU architecture and with preferred hardware acceleration. Sometimes serving a model requires a specific version of libraries, which is why you can solve a lot of problems via ONNX.

Exploration is Key

There are a lot of options for ML model training, saving, conversion and deployment. Every library has its pros and cons, some of them are easy to use for training and development. Others, on the other hand, are specialized for specific platforms or for specific fields (computer vision, recommender systems or NLP).

I would recommend you invest some time in exploring all the frameworks and systems, before deciding which framework you would like to lock in. The competition is rough in this field and every company tries to be as innovative as possible to keep up with the others. Even a Chinese company Baidu developed their own solution called PaddlePaddle. At the end of the article, I will give some recommendations on which frameworks and serving systems you should use and when.

The Best Machine Learning Serving Tools

OK, let’s say that you trained your own model or downloaded one that has already been trained. Now you would like to deploy a machine-learning model in production. Here are a few options that you can try.

If you don’t know how to train a machine learning model, you can start with this tutorial by PyTorch.

Deploy ML Models With API

If you have one or a few models, you can build your own system for ML model serving. With Python and libraries such as Flask or Django, there is a straightforward way to develop a simple REST API. When the web service starts, it loads the model in the background and then every incoming request will call the model on the incoming data.

It could get problematic if you want to effectively work with GPU cards, and handle parallel requests. I would recommend packing the system to Docker and then running it in Kubernetes.

With Kubernetes, Docker and smart load-balancing as HAProxy such a system can potentially scale to bigger volumes. Java or Go languages are also good languages to deploy ML models.

Here is a simple tutorial with a sci-kit-learn model as REST API with Flask.

Now let’s have a look at the open-source serving systems that you can use out of the box, usually with a small piece of code or no code at all.

TensorFlow Serving

GitHub | Docs

TensorFlow Serving is a modern serving system for TensorFlow ML models. It’s a part of TensorFlow Extended developed by Google. The recommended way of using the system is via Docker.

Simply run the Docker pull TensorFlow/serving (optionally TensorFlow/serving:latest-gpu if you need GPU support) command. Just run the image via Docker:

docker run -p 8501:8501 
  --mount type=bind,source=/path/to/my_model/,target=/models/my_model 
  -e MODEL_NAME=my_model -t tensorflow/serving

Now that the system is serving your model, you can query with gRPC or REST calls. For more information, read the documentation. TensorFlow Serving works best with the SavedModel format. The model should define its signature_def_map which will define the inputs and outputs of the model. If you would like to dive into the system then my recommendation is videos by the team itself.

In my opinion, TensorFlow serving is great with simple models and just a few versions. The documentation, however, could be simpler. With advanced architectures, you will need to define the custom operations, which is a big disadvantage if you have a lot of models with more modern operations.

TorchServe

GitHub | Docs

TorchServe is a more modern system than TensorFlow Serving. The documentation is clean and supports basically everything that TF Serving does, however, this one is for PyTorch models. Before serving a PyTorch model via TorchServe, you need to convert them to .mar packages. Basically, the .mar package tells the model name, version, architecture and actual weights of the model. Installation and running are also possible via Docker, and it is very similar to TensorFlow Serving.

I personally like the management of the models, you are able to simply register new models by sending API requests, list models and query statistics. I find the TorchServe very simple to use. Both REST API and gRPC are available. If you are working with pure PyTorch models then the TorchServe is recommended way.

Triton

GitHub | Docs

Both of the serving systems mentioned above are tightly bound to the frameworks of the models they are able to serve. That is probably why Triton has a big advantage over them since it can serve both TensorFlow and PyTorch models. It is also able to serve OpenVINO, ONNX and TensorRT formats! That means it supports all the major formats in the machine learning field. Even though NVIDIA developed it, it doesn’t require a GPU card and can run also on CPUs.

To run Triton, simply pull it from the docker repository via the Docker pull nvcr.io/nvidia/tritonserver command. The triton servers are able to load models from a specific directory called model_repository. Each model is defined with configuration, in this configuration, there is a platform setting that defines a model format. For example, “tensorflow_graphdef” or “onnxruntime_onnx“. In this way, Triton knows how to run specific models.

The documentation is not super-easy to read (mostly GitHub README files) because it is in very active development. Otherwise, working with the models is similar to other serving systems, meaning calling models via gRPC or REST.

Ray Serve

GitHub | Docs

Ray is a general-purpose system for scaling machine learning workloads. It primarily focuses on model serving and providing the primitives for you to build your own ML platform on top.

Ray Serve offers a more Pythonic way of creating your own serving system. It is framework-agnostic and anything that can be run via Python can be run also with Ray. Basically, it looks as simple as Flask. You define the simple Python class for your model and decorate it with a route prefix handler. Then you just call the REST API request.

import requests
from starlette.requests import Request
from typing import Dict

from ray import serve

# 1: Define a Ray Serve deployment.
@serve.deployment(route_prefix="/")
class MyModelDeployment:
    def __init__(self, msg: str):
        # Initialize model state: could be very large neural net weights.
        self._msg = msg

    def __call__(self, request: Request) -> Dict:
        return {"result": self._msg}

# 2: Deploy the model.
serve.run(MyModelDeployment.bind(msg="Hello world!"))

# 3: Query the deployment and print the result.
print(requests.get("http://localhost:8000/").json())

If you want to have more control over the system, Ray is a great option. There is a Ray Clusters library which is able to deploy the system on your own Kubernetes Cluster, AWS or GCP with the ability to configure the autoscaling option.

MLflow

MLflow is an open-source platform for the whole ML lifecycle. From training to evaluation, deployment, tracking, model monitoring and central model registry.

MLflow offers a robust API and several language bindings for the whole management of the machine learning model’s lifecycle. There is also a UI for tracking your trained models. MLflow is really a mature package with a whole bundle of components that your team can use.

Other Useful Tools for Machine Learning Model Serving

Multi-Model-Server is a similar system to the previous ones. Developed by the Amazon AWS team, the system is able to run models trained with MXNet or converted via ONNX.
BentoML is a project very similar to MLflow. There are many different tools that data scientists can use for training and deployment processes. The UI looks a bit more modern. BentoML is also able to automatically generate Docker images for your models.
KServe is a simple system for managing and scaling models on your Kubernetes. It solves the deployment, and autoscaling and provides standardized inference protocol across ML frameworks.

Cloud Options of AWS, GCP and Azure

Of course, every big tech player provides cloud platforms to host and serve your machine learning models. Let’s have a quick look at a few examples.

Microsoft is a big supporter of ONNX, so with Azure Machine Learning services, you are able to deploy your models to the cloud via Python or Azure CLI. The process requires an entry script in Python with two methods: init for initialization of your model and run for inference. You can find the entire workflow in Azure development documentation.

The Google Cloud Platform (GCP) has good support for TensorFlow as it is their native framework. However, Docker deployment is available, so other frameworks can be used too. There are multiple ways to achieve the deployment. The classic way will be using the AI Platform prediction tool or Google Cloud Run. There is also a serverless HTTP endpoint/function, which serves your model stored in the Google Cloud Storage bucket. You define your function in Python with the prediction method and loading of the model.

Amazon Web Services (AWS) also contains multiple options for the ML deployment process and serving. The specialized system for machine learning is Amazon Sagemaker.

All the big platforms allow you to create your own virtual server instances. Create your Kubernetes clusters and use any of the systems/frameworks mentioned earlier. Nevertheless, you need to be very careful because it could get really pricey. There are also smaller players on the market such as Banana, Seldon and Comet ML for training, serving & deployment. I personally don’t have experience with them but they are becoming more popular.

Large Language (LLMs) and Multi-Modal Models in Production

With the introduction of GPT by OpenAI a new class of AI models was introduced – the large language models (LLMs). These models are extremely big, trained on massive datasets and deployed on an infrastructure that requires a whole datacenter to run. “Smaller” – usually open source version – models are released but they also require a lot of computational resources and modern servers to run smoothly.

Recently, several serving systems for these models were developed:

OpenLLM by BentoML is a nice system that supports almost all open-source models like Llama2. You can just pick one of the models and run the following commands to start with the serving and query the results:

openllm start opt
export OPENLLM_ENDPOINT=http://localhost:3000
openllm query 'Explain to me the difference between "further" and "farther"'

vLLM project is a Python library that can help you with the deployment of LLM as an API Server. What is great is that it supports OpenAI-Compatible Server, so you can switch from OpenAI paid service easily to open source variant without modifying the code on the client. This project is being developed at UC Berkeley and it is integrating new techniques for fast inferencing of LLMs.
SkyPilot – is a great option if you want to run the LLMs on cloud providers such as AWS, Google Cloud or Azure. Because running these models is costly, SkyPilot is able to pick the cheapest provider automatically and launch it as an endpoint.

Ximilar AI Platform

Free Login | Docs

Last but not least, you can use our codeless machine-learning platform. Instead of writing a lot of code, training and deploying an ML model by yourself, you can try it in the Ximilar App. Training image classification and object detection can be done both in the browser App or via API. There is every tool that you would need in the ML model development stage, such as training data/image management, labelling tools, evaluation of your models on testing and training datasets, performance metrics, explanation of models on specific images, and so on.

Ximilar’s computer vision platform enables you to develop AI-powered systems for image recognition, visual quality control, and more without knowledge of coding or machine learning. You can combine them as you wish and upgrade any of them anytime.

Once your model is trained, it is deployed as a REST API endpoint. It can be connected to a workflow of more machine learning models working together with conditions like if-else statements. The major benefit is you just connect your system to the API and query the results. All the training and serving problems are solved by us. In the end, you will save a lot of costs because you don’t need to own or rent your infrastructure, serving systems or specialized software engineering team on machine learning.

We built a Ximilar Platform so that businesses from e-commerce, healthcare, manufacturing, real estate and other areas could simply develop their own AI models without coding and with a reasonable budget. For example, on the following screen, you can see our task management for the trading cards collector community.

We and our customers use our platform for the training of machine learning models. Together with our own system for machine learning model serving is it an all-in-one solution for ML model deployment.

The great thing is that everything is manageable via REST API requests with JSON responses. Here is a simple curl command to query all models in production:

curl --request GET 
  --url https://api.ximilar.com/recognition/v2/task/ 
  --header 'Content-Type: application/json' 
  --header 'authorization: Token APITOKEN'

Deployment of ML Models is Science

There are a lot of systems that try to make deployment and serving easy. The topic of deployment & serving is broad, with many choices for hardware infrastructure, DevOps, programming languages, system development, costs, storage, and scaling. So it is not easy to pick one. If you would like to dig deeper, I would suggest the following content for further reading:

For the performance test of serving systems, I recommend a post from Biano that includes testing scripts.
A nice overview of all the deployment systems is also in a video lecture on the Full Stack Deep Learning course.

My Final Tips & Recommendations

Pick a good framework to start with

Doing machine learning for more than 10 years, my advice is to start by picking a good framework for model development. In my opinion, the best choice right now is PyTorch. Using it is easy and it supports a lot of state-of-the-art architectures.

I used to be a fan of TensorFlow for a long time, but over time, its developers were not able to integrate modern approaches. Also, the backward compatibility is often disrupted and the quality of code is getting worse which leads to more and more bugs in the framework.

Save your models in different formats

Second, save your models in different formats. I would also recommend using ONNX and OpenVino here. You never know when you will need it. This happened to me a few times. We needed to upgrade the server and systems (our production environment), but the new versions of libraries stopped supporting the specific format of the model, so we had to switch to a different one.

Pick a serving system suitable to your needs

If you are a small company, then Ray Serve is a good option. Bigger companies, on the other hand, have complex requirements for development and robust infrastructure. In this case, I would recommend picking more complex systems like MLFlow. If you would like to serve the models on the cloud, then look at a multi-model server. The choice is really based on the use case. If you don’t want to bother with all of this then try our Ximilar Platform, which is a solution model optimization, model validation, data storage and model deployment as API.

I will keep this article updated and if there is some new perspective serving system I will be more than happy to mention it here. After all, machine learning is about constant progress, and that is one of the things I like about it the most.

The post The Best Tools for Machine Learning Model Serving appeared first on Ximilar: Visual AI for Business.

Predict Values From Images With Image Regression

Zuzana Raidová — Wed, 22 Mar 2023 15:03:45 +0000

We are excited to introduce the latest addition to Ximilar’s Computer Vision Platform. Our platform is a great tool for building image classification systems, and now it also includes image regression models. They enable you to extract values from images with accuracy and efficiency and save your labor costs.

Let’s take a look at what image regression is and how it works, including examples of the most common applications. More importantly, I will tell you how you can train your own regression system on a no-code computer vision platform. As more and more customers seek to extract information from pictures, this new feature is sure to provide Ximilar’s customers with the tools they need to stay ahead of the curve in today’s highly competitive AI-driven market.

What is the Difference Between Image Categorization and Regression?

Image recognition models are ideal for the recognition of images or objects in them, their categorization and tagging (labelling). Let’s say you want to recognize different types of car tyres or their patterns. In this case, categorization and tagging models would be suitable for assigning discrete features to images. However, if you want to predict any continuous value from a certain range, such as the level of tyre wear, image regression is the preferred approach.

Image regression is an advanced machine-learning technique that can predict continuous values within a specific range. Whenever you need to rate or evaluate a collection of images, an image regression system can be incredibly useful.

For instance, you can define a range of values, such as 0 to 5, where 0 is the worst and 5 is the best, and train an image regression task to predict the appropriate rating for given products. Such predictive systems are ideal for assigning values to several specific features within images. In this case, the system would provide you with highly accurate insights into the wear and tear of a particular tyre.

Predicting the level of tires worn out from the image is a use case for an image regression task, while a categorization task can recognize the pattern of the tyre.

How to Train Image Regression With a Computer Vision Platform?

Simply log in to Ximilar App and go to Categorization & Tagging. Upload your training pictures and under Tasks, click on Create a new task and create a Regression task.

Creating an image regression task in Ximilar App.

You can train regression tasks and test them via the same front end or with API. You can develop an AI prediction task for your photos with just a few clicks, without any coding or any knowledge of machine learning.

This way, you can create an automatic grading system able to analyze an image and provide a numerical output in the defined range.

Use the Same Training Data For All Your Image Classification Tasks

Both image recognition and image regression methods fall under the image classification techniques. That is why the whole process of working with regression is very similar to categorization & tagging models.

Working with image regression model on Ximilar computer vision platform.

Both technologies can work with the same datasets (training images), and inputs of various image sizes and types. In both cases, you can simply upload your data set to the platform, and after creating a task, label the pictures with appropriate continuous values, and then click on the Train button.

Apart from a machine learning platform, we offer a number of AI solutions that are field-tested and ready to use. Check out our public demos to see them in action.

If you would like to build your first image classification system on a no-code machine learning platform, I recommend checking out the article How to Build Your Own Image Recognition API. We defined the basic terms in the article How to Train Custom Image Classifier in 5 Minutes. We also made a basic video tutorial:

Tutorial: train your own image recognition model with Ximilar platform.

Neural Network: The Technology Behind Predicting Range Values on Images

The most simple technique for predicting float values is linear regression. This can be further extended to polynomial regression. These two statistical techniques are working great on tabular input data. However, when it comes to predicting numbers from images, a more advanced approach is required. That’s where neural networks come in. Mathematically said, neural network “f” can be trained to predict value “y” on picture “x”, or “y = f(x)”.

Neural networks can be thought of as approximations of functions that we aim to identify through the optimization on training data. The most commonly used NNs for image-based predictions are Convolutional Neural Networks (CNNs), visual transformers (VisT), or a combination of both. These powerful tools analyze pictures pixel by pixel, and learn relevant features and patterns that are essential for solving the problem at hand.

CNNs are particularly effective in picture analysis tasks. They are able to detect features at different spatial scales and orientations. Meanwhile, VisTs have been gaining popularity due to their ability to learn visual features without being constrained by spatial invariance. When used together, these techniques can provide a comprehensive approach to image-based predictions. We can use them to extract the most relevant information from images.

What Are the Most Common Applications of Value Regression From Images?

Estimating Age From Photos

Probably the most widely known use case of image regression by the public is age prediction. You can come across them on social media platforms and mobile apps, such as Facebook, Instagram, Snapchat, or Face App. They apply deep learning algorithms to predict a user’s age based on their facial features and other details.

While image recognition provides information on the object or person in the image, the regression system tells us a specific value – in this case, the person’s age.

Needless to say, these plugins are not always correct and can sometimes produce biased results. Despite this limitation, various image regression models are gaining popularity on various social sites and in apps.

Ximilar already provides a face-detection solution. Models such as age prediction can be easily trained and deployed on our platform and integrated into your system.

Value Prediction and Rating of Real Estate Photos

Pictures play an essential part on real estate sites. When people are looking for a new home or investment, they are navigating through the feed mainly by visual features. With image regression, you are able to predict the state, quality, price, and overall rating of real estate from photos. This can help with both searching and evaluating real estate.

Predicting rating, and price (regression) for household images with image regression.

Custom recognition models are also great for the recognition & categorization of the features present in real estate photos. For example, you can determine whether a room is furnished, what type of room it is, and categorize the windows and floors based on their design.

Additionally, a regression can determine the quality or state of floors or walls, as well as rank the overall visual aesthetics of households. You can store all of this information in your database. Your users can then use such data to search for real estate that meets specific criteria.

Image classification systems such as image recognition and value regression are ideal for real estate ranking. Your visitors can search the database with the extracted data.

Determining the Degree of Wear and Tear With AI

Visual AI is increasingly being used to estimate the condition of products in photos. While recognition systems can detect individual tears and surface defects, regression systems can estimate the overall degree of wear and tear of things.

A good example of an industry that has seen significant adoption of such technology is the insurance industry. For example, startups-like Lemonade Inc, or Root use AI when paying the insurance.

With custom image recognition and regression methods, it is now possible to automate the process of insurance claims. For instance, a visual AI system can indicate the seriousness of damage to cars after accidents or assess the wear and tear of various parts such as suspension, tires, or gearboxes. The same goes with other types of insurance, including households, appliances, or even collectible & antique items.

Our platform is commonly utilized to develop recognition and detection systems for visual quality control & defect detection. Read more in the article Visual AI Takes Quality Control to a New Level.

Automatic Grading of Antique & Collectible Items Such as Sports Cards

Apart from car insurance and damage inspection, recognition and regression are great for all types of grading and sorting systems, for instance on price comparators and marketplaces of collectible and antique items. Deep learning is ideal for the automatic visual grading of collector items such as comic books and trading cards.

By leveraging visual AI technology, companies can streamline their processes, reduce manual labor significantly, cut costs, and enhance the accuracy and reliability of their assessments, leading to greater customer satisfaction.

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

Food Quality Estimation With AI

Biotech, Med Tech, and Industry 4.0 also have a lot of applications for regression models. For example, they can estimate the approximate level of fruit & vegetable ripeness or freshness from a simple camera image.

The grading of vegetables by an image regression model.

For instance, this Japanese farmer is using deep learning for cucumber quality checks. Looking for quality control or estimation of size and other parameters of olives, fruits, or meat? You can easily create a system tailored to these use cases without coding on the Ximilar platform.

Build Custom Evaluation & Grading Systems With Ximilar

Ximilar provides a no-code visual AI platform accessible via App & API. You can log in and train your own visual AI without the need to know how to code or have expertise in deep learning techniques. It will take you just a few minutes to build a powerful AI model. Don’t hesitate to test it for free and let us know what you think!

Our developers and annotators are also able to build custom recognition and regression systems from scratch. We can help you with the training of the custom task and then with the deployment in production. Both custom and ready-to-use solutions can be used via API or even deployed offline.

How do custom projects work?

The post Predict Values From Images With Image Regression appeared first on Ximilar: Visual AI for Business.

Ximilar Introduces a Brand New App

Zuzana Raidová — Mon, 06 Dec 2021 11:06:53 +0000

An update is never late, nor is it early. It arrives precisely when we mean it to. After tuning up the back end for four years, the time has come to level up the front end of our App as well. We tested multiple ways, got valuable feedback from our users, and now we’re happy to introduce a new interface. It is more user-friendly, there are richer options, and the orientation in the growing number of our services is easier.

All Important Things at Hand

Ximilar provides a platform for visual AI, where anyone can create, train and deploy custom-made visual AI solutions based on the techniques of machine learning and computer vision. The platform is accessible via API and a web-based App, where users from all around the world work with both ready-to-use and custom solutions. They implement them into their own apps, quality control or monitoring systems in factories, healthcare tools and so on.

We created the new interface to adapt to the ever-increasing number of services we provide. It now makes better use of both the dashboard and sidebar, showcases useful articles and guides, and provides more support. So, let’s take a look at the major new features!

Service Categories & News

We grouped our services based on how they work with data and the degree of possible customization. After you log into the application, you will see the cards of four service groups with short descriptions on the dashboard. Below them, you can see the newest articles from our Blog, where we publish a lot of useful tips on how to create and implement custom visual AI solutions.

The service groups are following:

Ready-to-use Image Recognition includes all the services, that you can use straight away without the need for additional training, custom tags and labels. In principle, these services analyze your data (i.e., your image collection) and provide you with information based on image recognition, object detection, analysis of colors & styles etc. Here you will find Fashion Tagging, Home Decor Tagging, Photo Tagging and Dominant Colors.
Custom Image Recognition allows you to train custom Categorization & Tagging and Object Detection models. Flows, that enable you to combine the models, are also under this category. To prepare the training data for object detection seamlessly and fast, you can use our own tool Annotate.
Visual Search encompasses all services able to identify, analyze and compare visually similar content. Image Similarity can find, compare and recommend visually similar images or products. You can also use Image Matching to identify duplicates or near-duplicates in your collection, or create a fully custom visual search. Fashion Search is a complex service based on visual search and fashion tagging for apparel image collections.
Image Tools are online tools based on computer vision and machine learning that will when provided with an image, modify it. You can then either use the result or implement these image tools in your Flows. Here you will find Remove Background and Image Upscaler.

Do you want to learn more about AI and machine learning? Check the list of The Best Resources on Artificial Intelligence and Machine Learning.

Discover Services

Within the service groups, you can now browse all our services, including the ones that are not in your pricing scheme. Every service dashboard features a service overview and links to documentation, useful guides, case studies & video tutorials.

Do you want to know what you pay for when using our App? Check our article on API credit packs or the documentation.

Guides & Help at Hand

The sidebar underwent some major changes. It now displays all service groups and services. At the bottom, you will find the Guides & Help section with all necessary links to the beginner App Overview tutorial, Guides, Documentation & Contacts in case you need help.

How to make the most of a computer vision solution? Our guides are packed with useful tips & tricks, as well as first-hand experience of our machine learning specialists.

Customize the Sidebar With Favorites

Since each use case is highly specific, our users usually use a small group of services or only one service at a time. That is why you can now pin your most-used services as Favorites.

When you first log into the new front end, all of your previously used services will be marked as favourites. You can then choose which of them will stay on top.

What’s next?

This front-end update is just a first step out of many we’ve been working on. We focus on adding some major features to the platform, such as explainability, as well as custom image regression models. The Ximilar platform provides one of the most advanced Visual AI tools with API on the market, and you can test them for free. Nevertheless, the key to the improvement of our services and App are your opinions and user experience. Let us know what you think!

The post Ximilar Introduces a Brand New App appeared first on Ximilar: Visual AI for Business.

Ximilar Amongst Growth Stars of Deloitte Technology Fast 50 CE 2021

Zuzana Raidová — Tue, 30 Nov 2021 09:59:16 +0000

The Deloitte Technology Fast 50 in Central Europe recognizes and profiles fast-growing public or private technology companies in the region. They also compare the fastest growing young companies, so-called Growth Stars, and companies with significant impact on the business, environment, society, and diversity amongst employees, called Impact Stars.

Czech Companies Dominated the Ranking

Deloitte Technology Fast CE 50 compares the growth of technological companies in the fields of communication, environmental technology, fintech, hardware, healthcare, life sciences, media & entertainment, and software. This year, once again, Czech companies dominated the ranking. Four out of the five best companies in CE Fast 50 were from the Czech Republic, led by a new record holder, FTMO, beating the company in second place by more than a staggering 50 %.

Ximilar Ranked 8th Amongst Growth Stars in the Czech Republic

Apart from the Fast 50, Deloitte also compares and awards the Companies to Watch. The Growth Stars ranking is for the companies who are usually too young to be eligible for Fast 50. Generally, they should be established by January 2018, have their own proprietary technology and showcase consistent and natural growth, with a certain base operating revenue (read the full selection criteria here). Ximilar ranked 8th amongst the Czech Growth stars and on the 22nd place total.

Ximilar ranked 8th amongst Czech Growth Stars and 22nd in total

We would like to congratulate all the companies, which ranked in the Fast 50, Growth Stars and Impact Stars, and wish them good luck in the future! The awards ceremony was streamed live on the platform of Seznam Zprávy. You can see the complete list of laureates here.

Do you want to give your business a boost? Let’s talk.

How do custom projects work?

The post Ximilar Amongst Growth Stars of Deloitte Technology Fast 50 CE 2021 appeared first on Ximilar: Visual AI for Business.

How to deploy object detection on Nvidia Jetson Nano

Michal Lukáč — Mon, 18 Oct 2021 12:13:16 +0000

At the beginning of summer, we received a request for a custom project for a camera system in a factory located in Africa. The project was about detecting, counting, and visual quality control of the items on the conveyor belts in a factory with the help of visual AI. So we developed a complex system with neural networks on a small computer called Jetson Nano. If you are curious about how we did it, this article is for you. And if you need help with building similar solutions for your factory, our team and tools are here for you.

What is NVIDIA Jetson Nano?

There were two reasons why using our API was not an option. First, the factory has unstable internet connectivity. Also, the entire solution needs to run in real time. So we chose to experiment with embedded hardware that can be deployed in such an environment, and we are very glad that we found Nvidia Jetson Nano.

[Source]

Jetson Nano is an amazing small computer (embedded or edge device) built for AI. It allows you to do machine learning in a very efficient way with low-power consumption (about 5 watts). It can be a part of IoT (Internet of Things) systems, running on Ubuntu & Linux, and is suitable for simple robotics or computer vision projects in factories. However, if you know that you will need to detect, recognize and track tens of different labels, choose the higher version of Jetson embedded hardware, such as Xavier. It is a much faster device than Nano and can solve more complex problems.

What is Jetson Nano good for?

Jetson is great if:

You need a real-time analysis
Your problem can be solved with one or two simple models
You need a budget solution & be cost-effective when running the system
You want to connect it to a static camera – for example, monitoring an assembly line
The system cannot be connected to the internet – for example, because your factory is in a remote place or for security reasons

The biggest challenges in Africa & South Africa remain connectivity and accessibility. AI systems that can run in house and offline can have great potential in such environments.
Deloitte: Industry 4.0 – Is Africa ready for digital transformation?

Object Detection with Jetson Nano

If you need real-time object detection processing, use the Yolo-V4-Tiny model proposed in this repository AlexeyAB/darknet. And other more powerful architectures are available as well. Here is a table of what FPS you can expect when using Yolo-V4-Tiny on Jetson:

Architecture	mAP @ 0.5	FPS
yolov4-tiny-288	0.344	36.6
yolov4-tiny-416	0.387	25.5
yolov4-288	0.591	7.93

Source: Github

After the model’s training is completed, the next step is the conversion of the weights to the TensorRT runtime. TensorRT runtimes make a substantial difference in speed performance on Jetson Nano. So train the model with AlexeyAB/darknet and then convert it with tensorrt_demos repository. The conversion has multiple steps because you first convert darknet Yolo weights to ONNX and then convert to TensorRT.

There is always a trade-off between accuracy and speed. If you do not require a fast model, we also have a good experience with Centernet. Centernet can achieve a really nice mAP with precise boxes. If you run models with TensorFlow or PyTorch backends, then the speed is slower than Yolo models in our experience. Luckily, we can train both architectures and export them in a suitable format for Nvidia Jetson Nano.

Image Recognition on Jetson Nano

For any image categorization problem, I would recommend using simple architecture as MobileNetV2. You can select for example the depth multiplier for mobilenet of 0.35 and image resolution 128×128 pixels. In this way, you can achieve great performance both in speed and precision.

We recommend using TFLITE backend when deploying the recognition model on Jetson Nano. So train the model with the TensorFlow framework and then convert it to TFLITE. You can train recognition models with our platform without any coding for free. Just visit Ximilar App, where you can develop powerful image recognition models and download them for offline usage on Jetson Nano.

A simple Object Detection camera system with the counting of products can be deployed offline in your factory with Jetson Nano.

Recommended camera and utilities

Jetson Nano is simple but powerful hardware. However, it is not as powerful as your laptop or desktop computer. That’s why analyzing 4k images on Jetson will be very slow. I would recommend using max 1080p camera resolution. We used a camera by Raspberry PI, which works very well on Jetson and installation is easy!

I should mention that with Jetson Nano, you can come across some temperature issues. Jetson is normally shipped with a passive cooling system. However, if this small piece of hardware should be in the factory, and run stable for 24 hours, we recommend using an active cooling system like this one. Don’t forget to run the next command so your fan on Jetson starts working:

sudo jetson_clocks --fan

Installation steps & tips for development

When working with Jetson Nano, I recommend following guidelines by Nvidia, for example here is how to install the latest TensorFlow version. There is a great tool called jtop, which visualizes hardware stats as GPU frequency, temperature, memory size, and much more:

jtop tool can help you monitor statistics on Nvidia Jetson Nano.

Remember, the Jetson has shared memory with GPU. You can easily run out of 4 GB when running the model and some programs alongside. If you want to save more than 0.5 GB of memory on Jetson, then run the Ubuntu on LXDE desktop environment/interface. The LXDE is more lightweight than the default Ubuntu environment. To increase memory, you can also create a swap file. But be aware that if your project requires a lot of memory, it can eventually destroy your microSD card. More great tips and hacks can be found on JetsonHacks page.

For improvement of the speed of Jetson, you can also try these two commands, which will set the maximum power input and frequency:

sudo nvpmodel -m0
sudo jetson_clocks

When using the latest image for Jetson, be sure that you are working with the right OpenCV versions of the library. For example, some older tracking algorithms like MOSSE or KCF from OpenCV require a specific version. For some tracking solutions, I recommend looking on PyImageSearch website.

Developing on Jetson Nano

The experience of programming challenging projects, exploring new gadgets, and helping our customers is something that deeply satisfies us. We are looking forward to trying other hardware for machine learning such as Coral from Google, Raspberry Pi, or Intel Movidius for Industry 4.0 projects.

Most of the time, we are developing a machine learning API for large e-commerce sites. We are really glad that our platform can also help us build machine learning models on devices running in distant parts of the world with no internet connectivity. I think that there are many more opportunities for similar projects in the future.

The post How to deploy object detection on Nvidia Jetson Nano appeared first on Ximilar: Visual AI for Business.

Flows – The Game Changer for Next-Generation AI Systems

Zuzana Raidová — Wed, 01 Sep 2021 15:25:28 +0000

We have spent thousands of man-hours on this challenging subject. Gallons of coffee later, we introduced a service that might change how you work with data in Machine Learning & AI. We named this solution Flows. It enables simple and intuitive chaining and combining of machine learning models. This simple idea speeds up the workflow of setting up complex computer vision systems and brings unseen scalability to machine learning solutions.

We are here to offer a lot more than just training models, as common AI companies do. Our purpose is not to develop AGI (artificial general intelligence), which is going to take over the world, but easy-to-use AI solutions, that can revolutionize many areas of both business and daily life. So, let’s dive into the possibilities of flows in this 2021 update of one of our most-viewed articles.

Flows: Visual AI Setup Cannot Get Much Easier

In general, at our platform, you can break your machine learning problem down into smaller, separate parts (recognition, detection, and other machine learning models called tasks) and then easily chain & combine these tasks with Flows to achieve the full complexity and hierarchical classification of a visual AI solution.

A typical simple use case is conditional image processing. For instance, the first recognition task filters out non-valid images, then the next one decides a category of the image and, according to the result, other tasks recognize specific features for a given category.

Simple use of machine learning models combination in a flow

Flows allow your team to review and change datasets of all complexity levels fast and without any trouble. It doesn’t matter whether your model uses three simple categories (e.g. cats, dogs, and guinea pigs) or works with an enormous and complex hierarchy with exceptions, special conditions, and interdependencies.

It also enables you to review the whole dataset structure, analyze, and, if necessary, change its logic due to modularity. With a few clicks, you can add new labels or models (tasks), change their chaining, change the names of the output fields, etc. Neat? More than that!

Think of Flows as Zapier or IFTTT in AI. With flows, you simply connect machine learning models, and review the structure anytime you need.

Define a Flow With a Few Clicks

Let’s assume we are building a real estate website, and we want to automatically recognize different features that we can see in the photos. Different kinds of apartments and houses have various recognizable features. Here is how we can define this workflow using recognition flows (we trained each model with a custom image recognition service):

An example of real estate classifier made of machine learning models combined in a flow

The image recognition models are chained in a “main” flow called the branch selector. The branch selector saves the result in the same way as a recognition task node and also chooses an action based on the result of this task. First, we let the top category task recognize the type of estate (Apartment vs. Outdoor house). If it is an apartment, we can see that two subsequent tasks are “Apartment features” and “Room type”.

A flow can also call other flows, so-called nested flows, and delegate part of the work to them. If the image is an outdoor house, we continue processing by another nested flow called “Outdoor house”. In this flow, we can see another branching according to the task that recognizes “House type”. Different tasks are called for individual categories (Bungalow, Cottage, etc.):

An example use of nested flows – the main flow calls other nested flows to process images based on their category

Flow Elements We Used

So far, we have used three elements:

A recognition task, that simply calls a given task and saves the result into an output field with a specified name. No other logic is involved.
A branch selector, on the other hand, saves the result in the same way as a recognition task node, but then it chooses an action based on the result of this task.
Nested flow, another flow of tasks, that the “main” flow (branch selector) called.

Implicitly, there is also a List element present in some branches. We do not need to create it, because as soon as we add two or more elements to a single branch, a list generates in the background. All nodes in a list are normally executed in parallel, but you can also set sequential execution. In this case, the reordering button will appear.

Branch Selector – Advanced Settings

The branch selector is a powerful element. It’s worthwhile to explore what it can do. Let’s go through the most important options. In a single branch, by default, only actions (tag or category) with the highest relevance will be performed, provided the relevance (the probability outputted by the model) is above 50 %. But we can change this in advanced settings. We can specify the threshold value and also enable parallel execution of multiple branches!

The advanced settings of a branch selector, enabling to skip a task of a flow

You can specify the format of the results. Flat JSON means that results from all branches will be saved on the same level as any previous outcomes. And if there are two same output names in multiple branches, they can be overwritten. The parallel execution guarantees neither order nor results. You can prevent this from happening by selecting nested JSON, which will save the results from each branch under a separate key, based on the branch name (that is the tag/category name).

If some data (output_field) are present in the incoming request, we can skip calling the branch selector processing. You can define this in If Output Field Exists. This way we can save credits and also improve the precision of the system. I will show you how useful this behaviour can be in the next paragraphs. To learn about the advanced options of training, check this article.

An Example: Fashion Detection With Tags

We have just created a flow to tag simple and basic pictures. That is cool. But can we really use it in real-life applications? Probably not. The reason is, in most pictures, there is usually more than one clothing item. So how are we going to automate the tagging of more complex pictures? The answer is simple: we can integrate object detection into flows and combine it with recognition & tagging models!

Example of Fashion Tagging combined with Object Detection in Ximilar App

The flow structure then exactly mirrors the rich product taxonomy. Each image goes through a taxonomy tree in order to get proper tags. This is our “top classifier” – a flow that can tell one of our seven top categories of a fashion product image, which will determine how the image will be further classified. For instance, if it is a “Clothing” product, the image continues to “Clothing tagging” flow.

A “top classifier” – a flow that can tell one of our seven top categories of a fashion product image.

Similar to categorization or tagging, there are two basic nodes for object detection: the Detection Task for simple execution of a given task and Object Selector, which enables the processing of the detected objects.

Object Selector will call the object detection task. The detected objects will be extracted out of the image and passed further to any of the available nodes. Yes, any of them! Even another Object Selector, if, for example, you need to first detect people and then detect clothes on each person separately.

Object Selector – Advanced Settings

Object Selector behavior can be customized in similar ways as a Branch Selector. In addition to the Probability Threshold, there is also an Area Threshold. By default, all objects are processed. By setting this threshold, the objects that do not take at least a given percentage of an image are simply ignored. This can be changed to a single object by probability or area in Select. As I mentioned, we extract the object before further processing. We can extend it a bit to include some context using Expand Bounding Box by…

Setting a threshold for a space that an object should occupy in order to be detected

A Typical Flows Application: Fashion Tagging

We have been playing with the fashion subject since the inception of Ximilar. It is the most challenging and also the most promising one. We have created all kinds of tools and helpers for the fashion industry, namely Fashion Tagging, specialized Fashion Search, or Annotate. We are proud to have a very precise automatic fashion tagging service with a rich fashion taxonomy.

And, of course, Fashion Tagging is internally powered by Flows. It is a huge project with several dozens of features to recognize, about a hundred recognition tasks, and hundreds of labels all chained into several interconnected flows. For example, this is what our AI says about a simple dress now – and you can try it on your picture in the public demo.

Example of fashion attributes assigned to a dress by Ximilar Fashion Tagging flow

Try the Fashion Tagging demo

Include Pre-trained Services In Your Flow

The last group of nodes at your disposal are Ximilar services. We are working hard and on an ever-growing number of ready-to-use services which can be called through our API and integrated into your project. It is natural for our users to combine more AI services, and flows make it easier than ever. At this moment, you can call these ready-to-use recognition services:

Fashion Tagging (demo, details)
Home Decor Tagging (demo, details)
Dominant Colors (demo, details)

But more will come in the future, for example, Remove Background.

Increasing Possibilities of Flows

As our app and list of services grow, so do the flows. There are two features we are currently looking forward to. We are already building custom similarity models for our customers. As soon as they are ready, they will be available for combining in flows. And there is one more item very high on our list, which is predicting numeric values. Regression, in machine learning terms. Stay tuned for more exciting news!

Create Your Flow – It’s Free

Before Flows, setting up the AI Vision process was a tedious task for a skilled developer. Now everyone can set up, manage and alter steps on their own. In a comprehensive, visual way. Being able to optimize the process quickly, getting a faster response, losing less time and expenses, and delivering higher quality to customers.

And what’s the best part? Flows are available to the users of Ximilar’s free plan, so you can try them right away. Register or sign up to the Ximilar App and enter Flows service at the Dashboard. If you want to learn the basics first, check out our video tutorials. Then you can connect tasks and labels defined in your own Image Recognition.

Training of machine learning models is free with Ximilar, you are only paying for API calls for recognition. Read more about API calls or API credit packs. We strongly believe you will love Flows as much as we enjoyed bringing them to life. And if you feel like there is a feature missing, or if you prefer a custom-made solution, feel free to contact us!

The post Flows – The Game Changer for Next-Generation AI Systems appeared first on Ximilar: Visual AI for Business.

Image Similarity as a Service For Your Web

Michal Lukáč — Tue, 27 Jul 2021 16:43:13 +0000

With the service Image Similarity added to the Ximilar App, you can build your own visual similarity engine powered by artificial intelligence in just a few clicks, with several lines of code. Similarity search enables companies to improve the user experience significantly and increase revenue with smarter management of their visual data.

The technology behind image similarity is robust, reliable & fast. Built on state-of-the-art (SOTA) AI models and vector databases, you can search millions of images/products in milliseconds. It is used by big e-commerce players as well as small startups for showing visual alternatives or finding products with pictures. Some of our customers have hundreds of millions of images in their collections and do more than 100 million requests per month. Let’s dive into building a superfast similarity search service for your web.

What is Image Similarity?

Image Similarity, or image similarity search, is a visual AI service comparing, grouping, and recommending visually similar images. For example, a typical use case is a product recommendation of similar items in e-shops. It can also be used for reverse image search, where the query is an external image and the results are images from the collection. This approach gives way more accurate results than searching by tags, labels and other attributes.

Ximilar is using state-of-the-art deep learning models for all visual search services. We build our own indexing & searching technology that can run both as a service or on your hardware if needed. The collections can be focused either on product photos, fashion, image matching, or generic photos (stock images).

Go to the public demo

Features of the Image Similarity Service

Here are several features of the Image Similarity service that we think are crucial:

Simple access through the Ximilar App (creating a collection on click) and connection to REST API
The scalable search service can handle collections with hundreds of millions of similar items (images, videos, etc.) and hundreds of requests per second with both CRUD operations and searching
The ultra-fast and reliable engine that is mostly deployed in large e-commerce platforms – the query for finding the most visually similar product is low latency (in milliseconds)
The service is customizable – the platform enables you to train your own model for visual similarity search
Advanced filtering that supports JSON meta-data – if you need to restrict the result to a specific field
Grouping based on similarity – our search technology can group photos of the same product as one item
Security and privacy of your data – only meta-data and the visual representation of the images are stored, therefore your images are not stored anywhere
The service is affordable and cost-effective both for startups and enterprises, offering free plan for tests as well as discounts with your growth over time
We can deploy it on your hardware, independently of our infrastructure, and also offline – custom similarity model and deployment appropriate to your needs
Our search engine and machine learning models improve constantly – maintaining much higher quality than any other open-source project & we are able to build custom search engines with trained models

Applications Using Visual Similarity

According to this research by Deloitte, merchandising with artificial intelligence is more and more relevant, and recommendation engines play a vital part in it. Here are a few use cases for visual similarity engines:

E-shops that use product similarity to help customers to browse and find related products (e.g. in fashion & luxury items, home decor & furniture, art, wall art, prints & posters, collectible trading cards, comics, trademarks, etc.)
Stock photo databases suggesting similar content – getting visual alternatives of photos, designs, product images, and videos
Finding the exact products – apps like Vivino for finding wine or any kind of product are easy to develop for us
Visual similarity duplicate finder (also image matching or deduplication), to know which images are already in your database, or which product photos you can merge together
Reverse image search – finding a product or an image with a picture online
Finding similar real estate based for example on interior design, furniture, garden, etc.
Comparing two images for similarity – for example patterns or designs

Showing similar wall art with a jungle pattern. [Source]

Recommending products to your customers has several advantages. Firstly, it creates a better user experience and helps your customers find the right products faster. Secondly, it instantly makes the purchase rate on your web higher. This means a win on both sides – satisfied customers and higher revenue for you. Read more about customer experience and product recommendations in our blog post on fashion search.

Step by step: Building Real-Estate Image Search

Creating the Collection

So let’s take a look at how to easily build your own similarity search engine with the Ximilar platform. The first step is to log in to the Ximilar App. If you don’t have an account, then sign up – it’s free and takes just a minute. After that, on the Dashboard, click on the Visual Search tile and then the Image Similarity service. Then go to the Collections in the left menu and click on Create New Collection. It will show a pop-up with different collection types from which you need to select one.

The collection is a space where you upload your images. With this collection, you are performing queries for search. You can choose from Generic Photo Collection, Product Photo Collection, Dominant Colors Similarity, and Image Matching. Clicking on one of the cards will create a collection for your account.

Pick one collection type suitable for your data to create your similarity application.

Each of these collection types is suitable for different types of images:

Use Generic Photos if you work with stock photos
Pick a Product Photos collection if you are an e-commerce company
Select Image Matching to find duplicates in your images
For the fashion sector, we recommend using a specialized service called Fashion Search
Custom Similarity is suitable if you are working with another type of data (e.g. videos or 3D models). To do this, please schedule a call with us, and we will develop your own model tuned for your data. For instance, we built a photo search system for the Magic the Gathering Trading Cards for one of our customers.

For this example of real estate, I will use a Generic Photo Collection. The advantage of Generic Photo Collection is that it also supports searching images via text input/query. We usually develop custom similarity models for real estate, when the customers need specific and more accurate results. However, for this simple use case, the generic real estate model will be enough.

Schedule a call

Format of Image Similarity Dataset

First, we need to prepare a text file with JSON records. Each record represents an image that we want to store/insert into our collection. The key field is "_url" with the image URL. The advantage of the _url is that you can directly see and inspect the results via app.ximilar.com.

You can also optionally send records with base64 data, this is great if your data are stored locally on your computer. Don’t worry, we are not storing the whole images (data or base64) in the collection database, just URLs with all other metadata present in the records.

The JSON records look like this:
{"_id": "1_1", "_url": "_URL_IMAGE_PATH_", "estate_id": "1", "category": "indoor", "subcategory": "kitchen", "tags": []} {"_id": "1_2", "_url": "_URL_IMAGE_PATH_", "estate_id": "1", "category": "indoor", "subcategory": "kitchen", "tags": []} ...

If you don’t have image URLs, you can use either "_file" or "_base64" fields for the image data (locally stored "_file" data are automatically converted by the Python client to base64). The image similarity engine is indexing every record of the collection by extracting a representation from the image by a neural network model. However, we are not storing the images in our engine. So, only records that contain "_url" will be visualized in the Ximilar App.

You must store unique identifiers of each image in the "_id" field to identify your images in the collection. The value of this field must be a string. The API endpoint for searching is returning this _id values, that is how you get the results for visual search. You can also store additional fields for every JSON record, and then you can use these fields for filtering, grouping, and tuning the similarity function (see below).

Filling the Collection With Your Data

The next step requires a few lines of code. We are going to insert the prepared images into our collection using our python-client library. You can install the library using pip or directly from GitLab. The usage of the client is very straightforward and basically, you can just use the script tools/collections/insert_json_records.py:

python insert_json_records.py --type generic --auth_token __YOUR_TOKEN__ --collection_id __COLLECTION_ID__ --path /path/to/the/file.json

You will find the collection ID and the Authorization token on the “collection page” in the Ximilar App. This script will run for a few minutes, depending on the size of your image dataset.

Result: Finding Visually Similar Pictures

That was pretty easy, right? Now, if you go to the collections page, you will see something like this:

You can see your image similarity collection in the Ximilar App

All images from the JSON file were indexed, and now you can inspect the collection in the Ximilar App. Select the Similarity Search in the left menu of the Image Similarity service and test how the similarity works. You can specify the query image either by upload, by URL, or your IDs, or by choosing one of the randomly selected images from the collection.

Even though we have indexed just several hundred images, you can see that the similarity engine works pretty well. The first image is the query image and the next images are the k-nearest to the query image:

Showing most visually similar real estate to the first image.

Rest API Connection for Image Search

The next step might be to integrate the service into your application via API. You can either directly use the REST API for searching visually similar images or, if you are using Python, we recommend our Python SDK client like this:

# pip install ximilar-client
from ximilar.client import SimilarityPhotosClient
client = SimilarityPhotosClient("_API_TOKEN_", "_COLLECTION_ID_")
# search k nearest items
client.search({"_id": "1"}, k = 3)
# search by external image
client.search({"_url": "_URL_PATH_"})

Advanced Features for Photo Similarity

The search for visually similar images can be combined with filtering on metadata. This metadata can be stored in the JSON, as in our example with the "category" and "subcategory" fields. In the API, the filtering is specified using a MongoDB-like syntax – see the documentation.

For example, let’s say that we want to search for images similar to the image with ID=1_1 that are indoor photos made in a kitchen. We assume that this meta-information is stored in the “category” and “subcategory” fields of every JSON record. The query will look like this:

client.search({"_id": "1_1"}, filter={"category": "Indoor", "subcategory": "Kitchen"})

If we know that we will often filter on some fields, we can specify them in the “Fields to index” option of the collection to make the query processing more efficient.

You can specify which field from JS records will define your SKU identifier.

Often, your data contains several photos of one “object” – a product or, in our example, real estate. Our service can group the search results not by individual photos but by product IDs. You can set this in the advanced options of the collection by specifying the name of the real estate in the Product ID field, and the magic will happen.

Enhancing Image Similarity Engine with Tags

The image similarity is based purely on the visual content of the image. However, you can use your tags (labels, keywords) to enhance the similarity search. In the example, we assume that the data already contains categories, subcategories, and tags. In order to enhance the visual similarity search with tags, you can fill the “tags” field for every record with your tags, and also use method /v2/visualTagsKNN. After that, your search results will be based on a combination of visual similarity and keywords.

If you don’t have categories and tags, you can create your own photo tagger through our Image Recognition service, and enrich your image data automatically before indexing. The possibilities of image recognition models and their combinations are endless, resulting in highly customizable solutions. Read our guide on how to build your own Image Recognition API.

With the Ximilar Image Recognition service, you can create custom tagging models for your images.

You can build several models:

One classifier for categorizing indoor/outdoor/floor plan photos
One classifier for getting room type (Bedroom, Kitchen, Living room, etc.)
One tagger for outdoor tags like (Pool, Garden, Garage, House view, etc.)

To Sum Up

The real estate photo similarity search is only one use case of visual similarity from many (fashion, e-commerce, art, stock photos, healthcare…). We hope that you will enjoy working with this service, and we are looking forward to seeing your projects based on it. Thanks to our developers Libor and Ludovit, you can use this service through the frontend app.

Visual Similarity service by Ximilar is unique in terms of search quality, speed performance, and all the possibilities of the API. Our engineers are constantly upgrading the quality of the search, so you don’t have to. We are able to build custom solutions suitable for your data. With multiple collections, you can even A/B test the performance on your websites. This can run in our cloud as SaaS or in your warehouse! If you have more questions about pricing, and technical details, or you would like to run the similarity search engine on your own machines, then contact us.

Try our public demos

The post Image Similarity as a Service For Your Web appeared first on Ximilar: Visual AI for Business.

How to Build Your Own Image Recognition API?

Víťa Válka — Fri, 16 Jul 2021 10:38:27 +0000

Image recognition systems are still young, but they become more available every day. Usually, custom image recognition APIs are used for better filtering and recommendations of products in e-shops, sorting stock photos, classification of errors, or pathological findings. Ximilar, same as Apple Vision SDK or Google Tensorflow, make the training of custom recognition models easy and affordable. However, not many people and companies have been using this technology to its full potential so far.

For example, recently, I had a conversation with a client who said that Google Vision didn’t work for him, and it returned non-relevant tags. The problem was not the API but the approach to it. He employed a few students to do the labelling job and create an image classifier. However, the results were not good at all. After showing him our approach, sharing some tips and simple rules, he got better classification results almost immediately. This post should serve as a comprehensive guide for those, who build their own image classifiers and want to get the most out of it.

How to Begin

Image recognition is based on the techniques of machine learning and computer vision. It is able to categorize and tag images with tags describing the attributes recognized in them. You can read everything about the service and its possibilities here.

To train your own Image Recognition models and create a system accessible through API, you will first need to upload a set of training images and create your image recognition tasks (models). Then you will use the training set to train the models to categorize the images.

If you need your images to be tagged, you should upload or create a set of tags and train tagging tasks. As the last step, you can combine these tasks into a Flow, and modify or replace any of them anytime due to its modular structure. You can then gradually improve your accuracy based on testing, evaluation metrics and feedback from your customers. Let’s have a look at the basic rules you should follow to reach the best results.

The Basic Rules for Image Recognition Models Training

Each image recognition task contains at least two labels (classes, categories) – e.g., cats and dogs. A classic image recognition model (task) assigns one label to each image – so the image is either a cat or dog. In general, the more classes you have, the more data you will need to teach the neural network to predict labels.

Binary classification for cats and dogs. Source: Kelly Lacy (Pexels), Pixabay

The training images should represent the real data that will be analyzed in a production setting. For example, if you aim to build a medical diagnostic tool helping radiologists identify the slightest changes in the lung tissue, you need to assemble a database of x-ray images with proven pathological findings. For the first training of your task, we recommend sticking to these simple rules:

Start with binary classification (two labels) – use 50–100 images/label
Use about 20 labels for basic and 100 labels for more complex solutions
For well-defined labels use 200+ images/label
For hard to recognize labels add 100+ images/label
Pattern recognition – for structures, x-ray images, etc. use 50–100 images/label

Always keep in mind, that training one task with hundreds of labels on small datasets almost never works. You need at least 20 labels and 100+ images per label to start with to achieve solid results. Start with the recommended counts, and then add more if needed.

You can create your image recognition model via app.ximilar.com without coding.

The Difference Between Testing & Production

The users of Ximilar App can train tasks with a minimum of 20 images per label. Our platform automatically divides your input data into two datasets – training & test set, usually in a ratio of 80:20. The training set is used to optimize the parameters of the classifier. During the training, the training images are augmented in several ways to extend the set.

The test data (about 20 %) are then used to validate and measure accuracy by simulating how the model will perform in production. You can see the accuracy results on the Task dashboard in Ximilar App. You can also create an independent test dataset and evaluate it. This is a great way to get accurate results on a dataset that was not seen by the model in the training before you actually deploy it.

Remember, the lower limit of 20 images per label usually leads to weak results and low accuracy. While it might be enough for your testing, it won’t be enough for production. This is also called overfitting. Most of the time the accuracy in Ximilar is pretty high, easily over 80 % for small datasets. However, it is common in machine learning to use more images for more stable and reliable results in production. Some tasks need hundreds or thousands of images per label for the good performance of your production model. Read more about the advanced options for training.

The Best Practices in Image Recognition Training

Start With Fewer Categories

I usually advise first-time users to start with up to 10 categories. For example, when building an app for people to recognize shoes, you would start with 10 shoe types (running, trekking, sneakers, indoor sport, boots, mules, loafers …). It is easier to train a model with 10 labels, each with 100 training images of a shoe type, than with 30 types. You can let users upload new shoe images. This way, you can get an amazing training dataset of real images in one month and then gradually update your model.

Use Multiple Recognition Tasks With Fewer Categories

The simpler classifiers can be incredibly helpful. Actually, we can end up with more than 30 types of shoes in one model. However, as we said, it is harder to train such a model. Instead, we can create a system with better performance if we create one model for classifying footwear into main types – Sport, Casual, Elegant, etc. And then for each of the main types, we create another classifier. So for Sport, there will be a model that classifies sports shoes to Running shoes, Sneakers, Indoor shoes, Trekking shoes, Soccer shoes, etc.

Use Binary Classifiers for Important Classes

Imagine you are building a tagging model for real estate websites, and you have a small training dataset. You can first separate your images into estate types. For example, start with a binary classifier that separates images to groups “Apartment” and “Outdoor house”. Then you can train more models specifically for room types (kitchen, bedroom, living room, …), apartment features, room quality, etc. These models will be used only if the image is labelled as “Apartment”.

Ximilar Flows allow you to connect multiple custom image recognition models to API.

You can connect all these tasks via the Flows system with a few clicks. This way, you can chain multiple image recognition models in one API endpoint and build a powerful visual AI. Typical use cases for Flows are in the e-commerce and healthcare fields. Systems for fashion product tagging can also contain thousands of labels. It’s hard to train just one model with thousands of labels that will have good accuracy. But, if you divide your data into multiple models, you will achieve better results in a shorter time! For labelling work, you can use our image Annotation system if needed.

Choose Your Training Images Wisely

Machine learning performs better if the distribution of training and evaluated pictures is even. It means that your training pictures should be very visually similar to the pictures your model will analyze in a production setting. So if your model will be used in CCTV setting, then your training data must come from CCTV cameras. Otherwise, you are likely to build a model that has great performance on training data, but it completely fails when used in production.

The same applies to Real Estate and other fields. If the system analyzes images of real estate that were not made only by professional photographers, then you need to include photos from smartphones, with bad lighting, blurry images, etc.

Typical home decor and real estate images used for image recognition. Your model should be able to recognize both professional and non-professional images. Source: Pexels.

Improving the Accuracy of the System

When clicking on the training button on the task page, the new model is created and put in the training queue. If you upload more data or change labels, you can train a new model. You can have multiple versions of them and deploy to the API only specific version that works best for you. Down on the task page, you can find a table with all your trained models (only the last 5 are stored). For each trained model, we store several metrics that are useful when deciding which model to pick for production.

Multiple versions models of your task in Ximilar Platform. Click on activate and this version will be deployed as API.

Inspect the Results and Errors

Click on the zoom icon in the list of trained models to inspect the results. You can see the basic metrics: Accuracy, Recall, and Precision. Precision tells you what is the probability that the model is right if it predicts a specific label. Recall tells you how likely is the prediction correct. If we have high recall but lower precision for the label “Apartment” from our real estate example, then the model is probably predicting on every image that it is “Apartment” (even on the images that should be “Outdoor house”). The solution is probably simple – just add more pictures that represent “Outdoor house”.

The Confusion matrix shows you which labels are easily confused by the trained model. These labels probably contain similar images, and it is therefore hard for the model to distinguish between them. Another useful component is Failed Images (misclassified) that show you the model’s mistake on your data. With Failed images, you can also see labelling mistakes in your data and fix them immediately. All of these features will help you build a more reliable model with good performance.

Inspecting the results of your trained models can show you potential problems in your data.

Reliability of the Image Recognition Results

Every client is looking for reliability and robustness. Stay simple if you aim to reach high accuracy. Build models with just a few labels if you can. For more complex tagging systems use Flows. Building an image classifier with a limited number of training images needs an iterative approach. Here are a few tips on how to achieve high accuracy and reliable results:

Break your large task into simple decisions (yes or no) or basic categories (red, blue and green)
Make fewer categories & connect them logically
Use general models for general categories
Make sure your training data represent the real data your model will analyze in production
Each label should have a similar amount of images, so the data will be balanced
Merge very close classes (visually similar), then create another task only for them, and connect it via Flows
Use both human and UI feedback to improve the quality of your dataset – inspect evaluation metrics like Accuracy, Precision, Recall, Confusion Matrix, and Failed Images
Always collect new images to extend your dataset

Summary for Training Image Recognition Models

Building an image classifier requires a proper task definition and continuous improvements of your training dataset. If the size of the dataset is challenging, start simple and gradually iterate towards your goal. To make the basic setup easier, we created a few step-by-step video tutorials. Learn how to deploy your models for offline use here, check the other guides, or our API documentation. You can also see for yourself how our pre-trained models perform in the public demo.

We believe that with the Ximilar platform, you are able to create highly complex, customizable, and scalable solutions tailored to the needs of your business – check the use cases for quality control, visual search engines or fashion. The basic features in our app are free, so anyone can try it. Training of image recognition models is also free with Ximilar platform. You are simply paying only for calling the model for prediction. We are always here to discuss your custom projects and all the challenges in person or on a call. If you have any questions, feel free to contact us.

Try our public demos

The post How to Build Your Own Image Recognition API? appeared first on Ximilar: Visual AI for Business.

Image Annotation Tool for Teams

Michal Lukáč — Thu, 06 May 2021 11:55:57 +0000

Through the years, we worked with many annotation tools. The problem is most of the desktop annotating apps are offline and intended for single-person use, not for team cooperation. The web-based apps, on the other hand, mostly focus on data management with photo annotation, and not on the whole ecosystem with API and inference systems. In this article, I review, what should a good image annotation tool do, and explain the basic features of our own tool – Annotate.

Every big machine learning project requires the active cooperation of multiple team members – engineers, researchers, annotators, product managers, or owners. For example, supervised deep learning for object detection, as well as segmentation, outperforms unsupervised solutions. However, it requires a lot of data with correct annotations. Annotation of images is one of the most time-consuming parts of every deep learning project. Therefore, picking the right annotator tool is critical. When your team is growing and your projects require higher complexity over time, you may encounter new challenges, such as:

Adding labels to the taxonomy would require re-checking a lot of your work
Increasing the performance of your models would require more data
You will need to monitor the progress of your projects

Building solid annotation software for computer vision is not an easy task. And yes, it requires a lot of failures and taking many wrong turns before finding the best solution. So let’s look at what should be the basic features of an advanced data annotation tool.

What Should an Advanced Image Annotation Tool Do?

Many customers are using our cloud platform Ximilar App in very specific areas, such as Fashion, Healthcare, Security, or Industry 4.0. The environment of a proper AI helper or tool should be complex enough to cover requirements like:

Features for team collaboration – you need to assign tasks, and then check the quality and consistency of data
Great user experience for dataset curation – everything should be as simple as possible, but no simpler
Fast production of high-quality datasets for your machine-learning models
Work with complex taxonomies & many models chained with Flows
Fast development and prototyping of new features
Connection to Rest API with Python SDK & querying annotated data

With these needs in mind, we created our own image annotation tool. We use it in our internal projects and provide it to our customers as well. Our technologies for machine learning accelerate the entire pipeline of building good datasets. Whether you are a freelancer tagging pictures or a team managing product collections in e-commerce, Annotate can help.

Our Visual AI tools enable you to work with your own custom taxonomy of objects, such as fashion apparel or things captured by the camera. You can read the basics on the categories & tags and machine learning model training, watch the tutorials, or check our demo and see for yourself how it works.

The Annotate

Annotate is an advanced image annotation tool, which enables you to annotate images precisely and fast. It works as an end-to-end platform for visual data management. You can query the same images, change labels, create objects, draw bounding boxes and even polygons here.

It is a web-based online annotation tool, that works fully on the cloud. Since it is connected to the same back-end & database as Ximilar App, all changes you do in Annotate, manifest in your workspace in App, and vice versa. You can create labels, tasks & models, or upload images through the App, and use them in Annotate.

Ximilar Application and Annotate are connected to the same backend (api.ximilar.com) and the same database.

Annotate extends the functionalities of the Ximilar App. The App is great for training, creating entities, uploading data, and batch management of images (bulk actions for labelling and filtering). Annotate, on the other hand, was created for the detail-oriented management of images. The default single-zoomed image view brings advantages, such as:

Identifying separate objects, drawing polygons and adding metadata to a single image
Suggestions based on AI image recognition help you choose from very complex taxonomies
The annotators focus on one image at a time to minimize the risk of mistakes

Interested in getting to know Annotate better? Let’s have a look at its basic functions.

Deep Focus on a Single Image

If you enter the Images (left menu), you can open any image in the single image view. To the right of the image, you can see all the items located in it. This is where most of the labelling is done. There is also a toolbar for drawing objects and polygons, labelling images, and inspecting metadata.

In addition, you can zoom in/out and drag the image. This is especially helpful when working with smaller objects or big-resolution images. For example, teams annotating medical microscope samples or satellite pictures can benefit from this robust tool.

The main view of the image in our Fashion Tagging workspace

Create Multiple Workspaces

Some of you already know this from other SaaS platforms. The idea is to divide your data into several independent storages. Imagine your company is working on multiple projects at the same time and each of them requires you to label your data with an image annotation tool. Your company account can have many workspaces, each for one project.

Here is our active workspace for Fashion Tagging

Within the workspaces, you don’t mix your images, labels, and tasks. For example, one workspace contains only images for fruit recognition projects (apples, oranges, and bananas) and another contains data on animals (cats and dogs).

Your team members can get access to different workspaces. Also, everyone can switch between the workspaces in the App as well as in Annotate (top right, next to the user icon). Did you know, that the workspaces are also accessible via API? Check out our documentation and learn how to connect to API.

See API Documentation

Train Precise AI Models with Verification

Building good computer vision models requires a lot of data, high-quality annotations, and a team of people who understand the process of building such a dataset. In short, to create high-quality models, you need to understand your data and have a perfectly annotated dataset. In the words of the Director of AI at Tesla, Andrej Karpathy:

Annotate helps you build high-quality AI training datasets by verification. Every image can be verified by different users in the workspace. You can increase the precision by training your models only on verified images.

A list of users who verified the image with the exact dates

Verifying your data is a necessary requirement for the creation of good deep-learning models. To verify the image, simply click the button verify or verify and next (if you are working on a job). You will be able to see who verified any particular image and when.

Create and Track Image Annotating Jobs

When you need to process the newly uploaded images, you can assign them to a Job and a team of people can process them one by one in a job queue. You can also set up exactly how many times each image should be seen by the people processing this queue.

Moreover, you can specify, which photo recognition model or flow of models should be displayed when doing the job. For example, here is the view of the jobs that we are using in one of our tagging services.

Two jobs are waiting to be completed by annotators,
you can start working by hitting the play button on the right

When working on a job, every time an annotator hits the Verify & Next button, it will redirect them to a new image within a job. You can track the progress of each job in the Jobs. Once the image annotation job is complete, the progress bar turns green, and you can proceed to the next steps: retraining the models, uploading new images, or creating another job.

Draw Objects and Polygons

Sometimes, recognizing the most probable category or tags for an image is not enough. That is why Annotate provides a possibility to identify the location of specific things by drawing objects and polygons. The great thing is that you are not paying any credits for drawing objects or labelling. This makes Annotate one of the most cost-effective online apps for image annotation.

Simply click and drag the rectangle with the rectangle tool on canvas to create the detection object.

What exactly do you pay for, when annotating data? The only API credits are counted for data uploads, with volume-based discounts. This makes Annotate an affordable, yet powerful tool for data annotation. If you want to know more, read our newest Article on API Credit Packs, check our Pricing Plans or Documentation.

Annotate With Complex Taxonomies Elegantly

The greatest advantage of Annotate is working with very complex taxonomies and attribute hierarchies. That is why it is usually used by companies in E-commerce, Fashion, Real Estate, Healthcare, and other areas with rich databases. For example, our Fashion tagging service contains more than 600 labels that belong to more than 100 custom image recognition models. The taxonomy tree for some of the biotech projects can be even broader.

Navigating through the taxonomy of labels is very elegant in Annotate – via Flows. Once your Flow is defined (our team can help you with it), you simply add labels to the images. The branches expand automatically when you add labels. In other words, you always see only essential labels for your images.

Simply navigate through your taxonomy tree, expanding branches when clicking on specific labels.

For example, in this image is a fashion object “Clothing”, to which we need to assign more labels. Adding the Clothing/Dresses label will expand the tags that are in the Length Dresses and Style Dresses tasks. If you select the label Elegant from Style Dresses, only features & attributes you need will be suggested for annotation.

Automate Repetitive Tasks With AI

Annotate was initially designed to speed up the work when building computer vision solutions. When annotating data, manual drawing & clicking is a time-consuming process. That is why we created the AI helper tools to automate the entire annotating process in just a few clicks. Here are a few things that you can do to speed up the entire annotation pipeline:

Use the API to upload your previously annotated data to train or re-train your machine learning models and use them to annotate or label more data via API
Create bounding boxes and polygons for object detection & instance object segmentation with one click
Create jobs, share the data, and distribute the tasks to your team members

Predicting bounding boxes with one click automates the entire process of annotation.

Image Annotation Tool for Advanced Visual AI Training

As the main focus of Ximilar is AI for sorting, comparing, and searching multimedia, we integrate the annotation of images into the building of AI search models. This is something that we miss in all other data annotation applications. For the building of such models, you need to group multiple items (images or objects, typically product pictures) into the Similarity Groups. Annotate helps us create datasets for building strong image similarity search models.

Grouping the same or similar images with the Image Annotation Tool. You can tell which item is a smartphone photo or which photos should be located on an e-commerce platform.

Annotate is Always Growing

Annotate was originally developed as our internal image annotation software, and we have already delivered a lot of successful solutions to our clients with it. It is a unique product that any team can benefit from and improve the computer vision models unbelievably fast.

We plan to introduce more data formats like videos, satellite imagery (sentinel maps), 3D models, and more in the future to level up the Visual AI in fields such as visual quality control or AI-assisted healthcare. We are also constantly working on adding new features and improving the overall experience of Ximilar services.

Annotate is available for all users with Business & Professional pricing plans. Would you like to discuss your custom solution or ask anything? Let’s talk! Or read how the cooperation with us works first.

How do custom projects work?

The post Image Annotation Tool for Teams appeared first on Ximilar: Visual AI for Business.

Everything You Need to Know About Fashion Search

Zuzana Raidová — Wed, 07 Apr 2021 14:13:21 +0000

Keeping up with fashion trends is hard. And it’s even harder to satisfy clothing buyers, especially when most of the traditional stores are closed, and the consumers’ preferences are shifting towards e-commerce. It is crucial not only to bring the customers but to keep them on the website to increase the revenue – and with the ever-growing technological progress, the competition is increasing as well. After years of experience in Fashion AI, Ximilar developed the service Fashion Search tailored to the needs of customers dealing with a wide range of apparel (such as clothes, footwear, jewellery, and other accessories). Fashion Search covers apparel detection, tagging, sorting, and even suggestions based on visitors’ pictures, enabling you to build a memorable customer experience, make them happier and boost sales.

Customer Experience is Important

It all starts and ends with the customer experience. According to PwC, a good experience leaves people feeling heard and appreciated. And a disappointing experience, on the other hand, drives them away almost instantaneously. When the respondents were asked what they value the most, 80 % chose efficiency, convenience and friendly service, and 75 % chose up-to-date technology.

[Source: Wayhomestudio]

80 % of American consumers say that speed, convenience, knowledgeable help and friendly service are the most important elements of a positive customer experience. Prioritize technologies that provide these benefits rather than adopting new technologies for the sake of being cutting edge.
PwC, Experience is Everything

The KPMG 2018 research findings were similar: customer experience is more influential than ever. More importantly, AI gains more clout by delivering personalized, customized, and localized experiences to customers. AI, such as machine learning algorithms, intelligent search, or chatbots, take part in the entire product and service cycle nowadays. In addition, Accenture recently wrote that AI, neural networks, and image search are the leading technologies behind the apps taking over the market. And that they are important trends to watch in the following years.

Fashion E-Commerce Trends in 2021

Big trends in current fashion e-commerce are sustainability, slow fashion, second-hand apparel, and resale. For example, the second-hand clothes marketplace Vinted is growing exponentially. According to THREDUP, there are more second-hand shoppers than ever before, and the resale is going to be bigger than fast fashion by 2029.

Numbers in Fashion E-commerce are clear: we will work with users’ visual data on a much larger scale & it’s about time to step up our game with personalized recommendations.

There were major changes to consumer behaviour as well during the coronavirus pandemic. Glami’s 2021 Fashion Research showed that an astonishing half of the shops moved to the online world, and the number of online consumers is still growing. They are looking for visual inspiration, want shopping to be intuitive and fast, and expect personalized experiences more than ever.

What is Fashion Search?

The Fashion Search is an advanced Visual AI service, custom-built for the fashion industry. It enables you to create your own product discovery system, work smarter with data, and build a better customer experience. Fashion Search brings together the services and features most requested by our fashion e-commerce customers:

Object Detection & Fashion Tagging
The fashion apparel in your images is automatically detected and tagged. Our Fashion Tagging works with a hundred recognition models, hundreds of labels, and dozens of features, all chained into interconnected flows, enabling you to add content 24/7. We refine the quality and add new fashion attributes (features, categories & tags) to this service constantly. However, custom tags are welcomed as well. Read more about Fashion Tagging here.
Product Similarity & Search by Photo
This Visual Similarity combo enables you to provide a personalized customer experience. The Product Similarity finds and suggests similar products to the item your customer is viewing. As a result, the click rate increases up to 380 %. Search by Photo accepts the pictures your visitors uploaded and automatically recommends similar items from your collection. For example, it can analyze fashion influencer photos and help your customers find trending items on your site. Read more about Visual Search in Fashion or How to Build Superfast Image Similarity for your Website.
Synchronizing Product Data on Cloud
Our customers’ image databases are synchronized to their private collections on the Ximilar cloud. When you upload new products or images on your website, our AI automatically recognizes the fashion attributes and provides tags. There are two major benefits of synchronization. First, simple filtering or searching with fashion tags on your website. Second, personalized recommendations of your products similar to the users’ images. The synchronization periodicity is up to you. This way the visually similar results on your web are always up to date with your actual SKUs.

How Does Fashion Search Work?

The Fashion Search works as a customizable recommendation service allowing you to easily build your own fashion product discovery system. Imagine you are building a watch shop with product recommendations based on material, dial colour, and type. You can add your custom tags and show customers products of specific colours and types.

Product similarity for watches

The main added value of the Fashion Search lies in the possibility of combining advanced automatic Fashion Tagging with Visual Search. The technology behind our Visual Search is complex. It contains more than a hundred deep-learning models and algorithms for the extraction of dominant colours and other features. It also allows advanced filtering of visually similar results based on user-provided attributes.

Besides bringing these things together, there’s more we implemented to make the Fashion Search an effective and reliable tool tailored to the needs of fashion e-commerce. See how it works in our public demo.

Try public demos

Advanced Tagging with Object Detection

Object Detection is an indispensable tool for processing complex pictures with many items to recognize and tag. The service detects all fashion-related items on the picture and then provides tags for individual items.

We work with seven Top Categories (or types) of apparel: Clothing, Footwear, Watches, Bags, Accessories, Jewellery, and Underwear. Our system classifies the products from these categories into a hundred different categories, subcategories, and features with about 600 labels.

Customization of Fashion Search with Profiles

[Gabrielle Henderson, Unsplash]

The most common questions we get from our customers in fashion are “What if I need something that is not included in the Fashion Tagging? “ and “What if I need to rename or translate the labels? “.

We always welcome customization and Fashion Search is not an exception. It enables our users to create a fully customized environment. Adding a new feature requires only a few hundred example images per tag for the system to recognize them automatically. Also, it is natural that everyone has their own taxonomy and preferred language. Even the naming can be different, such as Jacket vs. Coat in the USA and the UK. Renaming and changing the category structure is done using a custom fashion profile according to your needs.

Our attitude is that fashion wouldn’t be fashion without personalization.

Also, for specialized sellers, the requirements for the richness of labels are much higher if they want to bring the best product discovery experience. For example, a shop specializing in luxury jewellery needs to rename the tags for Colour (gold, golden, yellow gold, gold-coloured, rose gold) or add exclusive features and tags, such as the colour of the gemstones. If you like this approach, don´t hesitate to contact us.

Upgraded Fashion Taxonomy

Our fashion annotation team focuses on widening fashion taxonomy and training new machine learning algorithms to recognize new features daily. Categories such as Clothing, Accessories or Footwear are constantly growing, while others are being made. For instance, we recently added a new recognition model for Embellishment (e.g. embroidery or studs), rich sub-categories of Dresses, and new categories of Jewellery and Watches. In the last couple of months, we added over 50 new categories and attributes, improving the precision of tagging results.

We have a lot of discussions about what’s next to explore and expand – and that is where the customer’s ideas come into play. Check if you can find your categories in our Fashion Taxonomy.

Download the Fashion Taxonomy Sheet Here

Advanced Analysis and Sorting of Fashion Images

Some customers need to obtain more detailed information about their images. Features such as the presence of a human model, background quality, or perspective can be used for filtering and sorting. To enable deeper image analysis, we built a specialized Meta Tagging model, that can be used to:

identify if there’s a person in the picture,
filter images with white background,
sort product images by the view,
and many more.

This Visual AI model enables you to manage the display of product images and build a smoother customer experience. More importantly, it makes your web easier to understand and navigate with features such as:

the first photo in the product list is always with/without a person,
the next photos are sorted by perspective,
the last photo is the detail, focused on the material, colour, or pattern,
all images with low quality are filtered out,
and so on.

This way, the visitors will always know what to expect and where to click to find what they’re looking for. For instance, it is important to display jewellery in standardised high-quality images with details, but also on the human body. Seeing a ring or a necklace on a hand or neck helps customers understand the fit and size of the jewellery, as well as the colours of materials and gems in natural light.

Product page with customer’s inputs at Shein

Community content also makes e-shopping sites more personal and engaging for visitors. A good example is Shein, where users can upload pictures along with their reviews of the product. This strategy definitely adds up to the trustworthiness and reliability of the seller.

How We Work with Data

Working with big specialized databases requires caution, patience, and a specific skill set. Our team of fashion annotators puts thousands of hours into building high-quality datasets with relevant labels and objects. To make their work more efficient and time well-spent, we developed our own image Annotation Tool, enhancing the functionalities of our App. It is the main reason why we deliver new features quickly, and it is available to our customers as well.

Enter Annotate

One of Ximilar’s greatest strengths is the detail-oriented yet efficient management of enormous databases of millions of pictures.

Each image shown to the model in training is verified numerous times by multiple annotators. Then, an optimisation mechanism makes sure we achieve the highest accuracy with the smallest amount of training images possible. After that, everything is evaluated on a broad test dataset. As a result, we only deploy models, that are inspected by Machine Learning specialists, and precise in their predictions.

See API Documentation

Never-Ending Innovations

What else? There have been some significant improvements to our speed performance. Cooperating with the Intel AI team, we did a lot of work on our backend side so you can query the results in milliseconds. We firmly believe that our Fashion AI services are some of the best in the market. Covering a variety of items from Clothing, Footwear, or Accessories, our Fashion AI works with a very rich taxonomy. To sum up, we are now ready to focus on more new features. For example, mining relevant keywords from textual metadata, enhancing and upscaling product images with the Super-Resolution model, model explainability, or background removal, but also on better and richer customer experience.

Application of Background removal service on image

Would you like to discuss your custom solution or ask anything? Read our story, check the pricing, or let’s talk!

The post Everything You Need to Know About Fashion Search appeared first on Ximilar: Visual AI for Business.