AI Interpretability - Ximilar: Visual AI for Business

Explainable AI: What is My Image Recognition Model Looking At?

Zuzana Raidová — Tue, 07 Dec 2021 14:16:20 +0000

There are many challenges in machine learning, and developing a good model is one of them. Even though neural networks are very powerful, they have a great weakness. Their complexity makes it hard to understand how they reach their decisions. This might be a problem when you want to move from development to production, and it might eventually cause your whole project to fail. But how can you measure the success of a machine learning model? The answer is not easy. In our opinion, the model must excel in a production environment and should work reliably in both common and uncommon situations.

However, even when the results in production are good, there are areas, where we can’t simply accept black box decisions without being sure, how the AI made them. These areas are typically medicine and biotech or any other field where there is no place for errors. We need to make sure that both output and the way our model reached its decision make sense – we need explainable AI. For these reasons, we introduced a new feature to our Image Recognition service called Explain.

Training Image Recognition

Image Recognition is a Visual AI service enabling you to train custom models to recognize images or objects in them. In Ximilar App, you can use Categorization & Tagging and Object Detection, which can be combined with Flows. For example, the first task will detect all the human models in the image and the categorization & tagging tasks will categorize and tag their clothes and accessories.

Image recognition is a very powerful technology, bringing automation to many industries. It requires well-trained models, and, in the case of object detection, precise data annotation. If you are not familiar with using image recognition on our platform, please try to set up your own classifier first.

These resources should be helpful in the beginning:

Check the Custom Image Recognition page
Read The Basic Rules for Image Recognition Models Training
Read Best Practices in Image Recognition Training
Watch our YouTube tutorial on how to set up the image recognition task
Read how to combine and chain your models with Flows

From model-centric to data-centric with explainable AI

Explaining which areas are important for the leaf disease recognition model when predicting a label called “canker”.

When you want a model which performs great in a production setting and has high accuracy, you need to focus on your training data first. Consistency of labelling, cleaning datasets from unnecessary samples/labels, and adding feature-rich samples that are missing is much more important than the newest architecture of the neural network. Andrew Ng, an entrepreneur and professor at Stanford, is also promoting this approach to building machine learning models.

The Explain feature in our App tells you:

which parts of images (features and pixels) are important for predicting specific labels
for which images the model will probably predict the wrong results
which samples should be added to your training dataset to improve performance

Simple Example: T-shirt or Not?

Let’s look at this simple example of how explainable AI can be useful. Let’s say we have a task containing two categories – t-shirts and shoes. For a start, we have 20 images in each category. It is definitely not enough for production, but it is enough if you want to experiment and learn.

This neural network has two labels: shoes and t-shirt.

After playing with the advanced options and short training, the result seems really promising:

Using Explain on a Training Image

But did the model actually learn what we wanted? To check, what the neural network find important when categorizing our images, we will apply two different methods with the tool Explain:

Grad-CAM (first published in 2016) – this method is very fast, but the results are not very precise
Blur Integrated Gradients (published in 2020) smoothed with SmoothGrad – this method provides much more details, but at the cost of computational time

Grad-Cam result of Explain feature. As you can see, the model is looking mostly at the head/face.

Blur-Integrated Gradients results, the most important features are head/face, similar to what grad-cam is telling us.

In this case, both methods clearly demonstrate the problem of our model. The focus is not on the t-shirt itself, but on the head of the person wearing it. In the end, it was easier for the learning algorithm and the neural network to distinguish between the two categories using this feature instead of focusing on the t-shirt. If we look at the training data for label t-shirt, we can see that all pictures include a person with a visible face.

Data for T-shirt label for the image recognition task. This small dataset contains only photos with visible faces, which can be a problem.

Explainability After Adding New Data

The solution might be adding more varied training data and introducing images without a person. Generally, it’s a good approach to start with a small dataset and over time increase it to a bigger one. Adding visually broad images helps model with overfitting on wrong features. So we added more photos to the label and trained the model again. Let’s see what the results look like with our new version of the model:

After retraining the model on new data, we can see the improvement for what features the neural network looking for.

The Grad-CAM result on the left is not very convincing in this case. The image on the right shows the result of Blur Integrated Gradients. Here you can see, how the focus moved from the head to the t-shirt. It seems like the head still plays some part, but there is much less focus on it.

Both methods for explainable AI have their drawbacks, and sometimes we have to try more pictures to get a better understanding of model behaviour. We also need to mention one important point. Due to the way the algorithm works, it tends to prefer edges, which is clearly visible in the examples.

Summary

The Explainability and Interpretability of Neural Networks is a big research topic, and we are looking forward to adopting and integrating more techniques into our SaaS AI solution. AI Explainability that we showed you is only one tool amongst many towards data-centric AI.

If you have any troubles, do not hesitate to contact us. The machine learning specialists of Ximilar have vast experience with different kinds of problems, and are always happy to help you with yours.

The post Explainable AI: What is My Image Recognition Model Looking At? appeared first on Ximilar: Visual AI for Business.

Evaluation on an Independent Dataset for Image Recognition

David Novák — Fri, 14 Aug 2020 10:34:13 +0000

Today, I would like to present the latest feature which many of you were missing in our custom image recognition service. We don’t see this feature in any other public platform that allows you to train your own classification models.

Many machine-learning specialists and data scientists are used to evaluate models on an test dataset, which is selected by them manually and is not seen by the training process. Why? So they see the results on a dataset which is not biased to the training data.

We think that this is a critical step for the reliability and transparency of our solution.

“The more we come to rely on technology, the more important it becomes that it’s robust and trustworthy, doing what we want it to do”.
Max Tegmark, scientist and author of international bestseller Life 3.0

Training, Validation and Testing Datasets at the Ximilar platform

Your Categorization or Tagging Task in the image recognition service contains labels. Every label must contain at least 20 images before Ximilar allows you to train the task.

When you click on the Train button, the new model/training is added to a queue. Our training process takes the next non-trained model (neural network) from the queue and starts the optimization.

The process divides your data/images (which are not labelled with a test flag) into the training dataset (80 %) and validation (20 %) dataset randomly.

In the first phase, the model optimization is done on the training dataset and evaluated it on the validation dataset. The result of this evaluation is stored, and its Accuracy number is what you see when you open your task or detail of the model.

In the last training phase, the process takes all your data (training and validation dataset) and optimizes the model one more time. Then we compute the failed images and store them. You can see them at the bottom of the model page.

Newly, you can mark certain images by the test flag. Before your optimized model is stored in the database (after the training phases), the model is evaluated on this test dataset (images with test flag). This is a very useful feature if you are looking for better insights into your model. In short:

Creating a test dataset for your Task is optional.
The test dataset is stable, it contains images only that you mark with the “test” flag. On the contrary, the validation dataset mentioned above is picked randomly. That’s why the results (accuracy, precision, recall) of the test dataset are better for monitoring between different model versions of your Task.
Your task is never optimized on the test dataset.

How to set up your test dataset?

If you want to add this test flag to some of your images, then select them and use the MODIFY button. Select the checkbox “Set as test images” and that’s it.

Naturally, these test images must have the correct labels (categories or tags). If the labels of your task contain also these test images, every newly trained model will contain the independent test evaluation. You can display these results by selecting the “Test” check button in the upper-right corner of the model detail.

To sum it up

So, if you have some series of test images, then you will be able to see two results in the detail of your model:

1. Validation Set selected randomly from your training images (this is shown as default)

2. Test Set (your selected images)

We recommend adding at least 30 images to the test set per Label/Tag and increasing the numbers (hundreds or thousands) over time.

We hope you will love this feature whether you are working on your healthcare, retail, or industry project. If you want to know more about insights from the model page, then read our older blog post. We are looking forward to bringing more explainability features (Explainable AI) in the future, so stay tuned.

The post Evaluation on an Independent Dataset for Image Recognition appeared first on Ximilar: Visual AI for Business.