Image Recognition - Ximilar: Visual AI for Business

New AI Solutions for Card & Comic Book Collectors

Zuzana Raidová — Wed, 18 Sep 2024 12:35:34 +0000

Recognize and Identify Comic Books in Detail With AI

The newest addition to our portfolio of solutions is the Comics Identification (/v2/comics_id). This service is designed to identify comics from images. While it’s still in the early stages, we are actively refining and enhancing its capabilities.

The API detects the largest comic book in an image, and provides key information such as the title, issue number, release date, publisher, origin date, and creator’s name, making it ideal for identifying comic books, magazines, as well as manga.

Comics Identification by Ximilar provides the title, issue number, release date, publisher, origin date, and creator’s name.

This tool is perfect for organizing and cataloging large comic collections, offering accurate identification and automation of metadata extraction. Whether you’re managing a digital archive or cataloging physical collections, the Comics Identification API streamlines the process by quickly delivering essential details. We’re committed to continuously improving this service to meet the evolving needs of comic identification.

Try how it works

Learn more

Star Wars Unlimited, Digimon, Dragon Ball, and More Can Now Be Recognized by Our System

Our trading card identification system has already been widely used to accurately recognize and provide detailed information on cards from games like Pokémon, Yu-Gi-Oh!, Magic: The Gathering, One Piece, Flesh and Blood, MetaZoo, and Lorcana.

Recently, we’ve expanded the system to include cards from Garbage Pail Kids, Star Wars Unlimited, Digimon, Dragon Ball Super, Weiss Schwarz, and Union Arena. And we’re continually adding new games based on demand. For the full and up-to-date list of recognized games, check out our API documentation.

Ximilar keeps adding new games to the trading card game recognition system. It can easily be deployed via API and controlled in our App.

Try how it works

See the full taxonomy

Detect and Identify Both Trading Cards and Their Slab Labels

The new endpoint slab_grade processes your list of image records to detect and identify cards and slab labels. It utilizes advanced image recognition to return detailed results, including the location of detected items and analyzed features.

Graded slab reading by Ximilar AI.

The Slab Label object provides essential information, such as the company or category (e.g., BECKETT, CGC, PSA, SGC, MANA, ACE, TAG, Other), the card’s grade, and the side of the slab. This endpoint enhances our capability to categorize and assess trading cards with greater precision. In our App, you will find it under Collectibles Recognition: Slab Reading & Identification.

Try how it works

Documentation

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

New Endpoint for Card Centering Analysis With Interactive Demo

Given a single image record, the centering endpoint returns the position of a card and performs centering analysis. You can also get a visualization of grading through the _clean_url_card and _exact_url_card fields.

The _tags field indicates if the card is autographed, its side, and type. Centering information is included in the card field of the record.

The card centering API by Ximilar returns the position of a card and performs centering analysis.

Try how it works

Documentation

Learn How to Scan and Identify Trading Card Games in Bulk With Ximilar

Our new guide How To Scan And Identify Your Trading Cards With Ximilar AI explains how to use AI to streamline card processing with card scanners. It covers everything from setting up your scanner and running a Python script to analyzing results and integrating them into your website.

Read the guide

Let Us Know What You Think!

And that’s a wrap on our latest updates to the platform! We hope these new features might help your shop, website, or app grow traffic and gain an edge over the competition.

If you have any questions, feedback, or ideas on how you’d like to see the services evolve, we’d love to hear from you. We’re always open to suggestions because your input shapes the future of our platform. Your voice matters!

The post New AI Solutions for Card & Comic Book Collectors appeared first on Ximilar: Visual AI for Business.

How To Scan And Identify Your Trading Cards With Ximilar AI

Michal Lukáč — Mon, 05 Aug 2024 15:23:55 +0000

In the world of trading card scanning and seller tools, efficiency is crucial. Applications like CollX, VGPC, or Collectr handle millions of daily requests for card identification from images from hobby users as well as those who earn cash selling trading cards. Ximilar offers similar services, providing powerful API solutions for businesses looking to effortlessly integrate visual search and image recognition functionalities into their apps or websites, with the possibility of customization.

Today, I’d like to introduce a solution specifically designed for physical stores and warehouses to process their physical card collections quickly and efficiently using card scanners like those from Fujitsu. This tutorial is tailored for shop owners who need to handle large volumes of card images rapidly. We’ve developed a simple yet powerful script in Python 3 for card identification, condition assessment or grading. It also identifies comic books and reads slab labels from companies like PSA or Beckett. The script outputs a CSV file that can be easily imported into Google Sheets or Microsoft Excel. With a few modifications, it can also be adapted for use with your Shopify store or other seller tools, such as for eBay submissions. Let’s dive in and see how this tool can streamline your card-processing workflow!

Capabilities of our AI Solution for Sports Cards and TCGs

Trading Card Games

In the previous blog post, I wrote about our REST API for identifying TCGs, sports cards, and comic book covers. The TCG identification service supports more trading card games, including the most popular ones like Pokémon, Yu-Gi-Oh!, Magic: The Gathering, One Piece, and Lorcana. For some games, it can also identify the correct language version of the card or determine if it is a foil/holographic card. Additionally, for certain TCG games, the system provides links or identification numbers to the TCG Player. You can try how it works here.

Sports Cards

For sports cards, we can identify more than 5 million trading cards across six main sports categories: baseball, hockey, football, soccer, MMA, and basketball cards. Our system also supports the identification of parallel and reprint versions, with continuous improvements. Not only does it provide the best match, but it also offers alternative options to choose from.

If the trading cards are in slabs from major grading companies like PSA, Beckett, CGC, TAG, SGC, or ACE, the system can instantly identify graded cards and provide the slab company, grade, and certificate number.

All Under One API

As you can see, the functionality is complex, offering features such as bulk trading card scanning and language support, resulting in highly accurate identification. I believe that Ximilar Collectibles Recognition services are the most accurate solutions available on the market today. It is a true game-changer for card dealers, other collectors, or companies looking to be independent of third parties like CollX, Kronozio, or Card Dealer Pro, which automatically submit your cards to their marketplaces.

With Ximilar, you can handle your trading card scanning independently using our visual search technology and deep learning models. Our solutions are also designed to suit your specific needs through continuous improvements and customization. Whether you purchase, scan, analyze, search, or sell cards in bulk, our API empowers you to manage your collection without the constraints of third-party services.

How to Analyze TCG and Sports Card Scanners With AI

Step 1 – Run The Cards Through The Scanner

Enough talk! Let’s analyze the bulk of your cards. First, you’ll need a folder with images of your cards. For testing, I’ve selected a small MTG and Pokémon card subset. You can put them on your scanner via top loader (link), or individually. Most card collectors use the Fujitsu Ricoh Fi-8170 scanner, which is one of the best scanners available. It can capture both the front and back sides of the cards.

For our purposes, we will only need the front side of the cards. To avoid unnecessary costs, remove the back side images from the folder or configure your scanner to store only the front side of the cards. Some scanners, like Fujitsu, can produce scan files with names such as 19032024-0001.jpg or 19032024-FRONT-0001.jpg. You can specify the naming format for the scan files. See the following video tutorial on how to set up a Fujitsu scanner via PaperStream Capture by MaxWaxPax:

My recommendation is to use similar settings for your Fujitsu scanner as it is in the video by MaxWaxPax and create multiple profiles for sideways and top-bottom trading card scanning. Ideally set up the scanner to produce only images for the front of the cards or distinguish the images with “front” or “back” suffix in the filename. However, if you already have an unstructured collection of card images, you can fully automate the selection of images showing the front sides using our AI Recognition of Collectibles.

Step 2 – Sign Up To Ximilar Platform

Now, you’ll need an account in our App. Simply sign up with your personal or company email to get your unique API token for service authorization. Once you are in the App, copy your API key to the clipboard and save it into some file. To access the service via API, you’ll need to purchase at least a Business plan. Both tasks – getting the API key and purchasing a Business plan – can be completed in the platform’s settings in a matter of minutes.

Step 3 – Installing Python 3

Before running the script, ensure you have Python 3 installed. Some operating systems already include a version of Python, but we require at least Python 3.6. If you’re unsure, follow this tutorial on RealPython (link), which contains installation steps for Windows, macOS, and Linux:

Installation via windows and macOS takes only a few clicks.

You should be able to write in your command line, shell or terminal the similar command. Here’s mine at Mac:

michallukac@Michals-MacBook ~ % python --version && pip --version

If you don’t know how to run commands, read a short tutorial on using the terminal/shell/command line. I recommend this tutorial by DjangoGirls or watching some YouTube videos (here’s one for Windows and one for macOS). The output from the command should look similar to my example:

Python 3.9.18

pip 23.1 from /Users/michallukac/env/devel/lib/python3.9/site-packages/pip (python 3.9)

Next, you will need to install Python libraries argparse and requests via pip command:

pip install --upgrade argparse

pip install --upgrade requests

If everything passes, you’re now ready to use the script we’ve prepared to process your folder of card images!

Step 4 – Running The Script On Trading Card Games

Running the script is simple. You’ll need to use a terminal (macOS), shell (Linux), or command line (Windows), which is why we installed Python 3. Download the following file from one of these addresses:

Put this file/script next to the folder (tcgscans) with your trading card images or scans and in the terminal, write the following command:

python process_card_scans.py --folder tcgscans --api_key YOURAPIKEY --collectible tcg --output results.csv --select_images all

Hitting the enter will execute the script on the folder of tcgscans, and the progress bar will be shown. The folder will analyze all the images in the folder (select_images). You can interrupt the script (it automatically stores the results every 10 images to your specified output CSV file):

Executing the script on trading card scan recognition.

Each analysis of a scan (sports card) will consume 10 credits from your credit supply in your Ximilar account. Our App lets you watch your credit consumption closely under Reports. The Business 100k Plan allows you to analyze 10,000 raw cards. If you need to analyze millions of cards per month or your entire collection at once, reach out to us, and we can offer you a bulk discount.

Visualization of API credit consumption per image processing operation in Ximilar App.

Step 5 – Analyzing the CSV file

Now we have our CSV file named results.csv. The CSV file contains the following fields: filename (name of the photo in the folder), status (ok or error), side (front or back), subcategory, full_name, name, year, card_number, series, set, set_code, and other additional fields.

The output format of the CSV depends on whether you analyze sports cards, TCG cards, comics, or slabs. Here is a visualization of the CSV file in Visual Studio Code:

My CSV file in Visual Studio Code.

We can import the file into Google Sheets or Microsoft Excel spreadsheet, edit it as needed, or generate printable checklists. The columns and data from the CSV can also be easily added to your Shopify product files or used for eBay submissions.

Additional information for card condition (or grading) can be added to the script via the –condition (–grading) parameter. For example, if your sports card scanner provides images with filenames such as 0001.jpg, 0002.jpg, 0003.jpg, etc., the following command will process images with odd numbering (e.g., 0001.jpg, 0003.jpg, …), identify the cards (name, card number, etc.), and also compute their condition (very good, excellent, etc.):

python process_card_scans.py --folder sportsfolder --api_key YOUR_API_KEY --collectible sport --output sport.csv --select_images odd --alternative --condition

Conclusion

With Ximilar’s AI-powered solutions, identifying and documenting your trading cards has never been easier. From trading card scanning, analyzing and organizing, to finding the current average market price, every step is streamlined to save you time and effort. I hope this guide helps you optimize your trading card workflow, making it easier to manage and showcase your collection. Happy collecting, whether it’s baseball or Pokémon cards!

Try our public demos

The post How To Scan And Identify Your Trading Cards With Ximilar AI appeared first on Ximilar: Visual AI for Business.

When OCR Meets ChatGPT AI in One API

Michal Lukáč — Wed, 14 Jun 2023 09:38:27 +0000

Imagine a world where machines not only have the ability to read text but also comprehend its meaning, just as effortlessly as we humans do. Over the past two years, we have witnessed extraordinary advancements in these areas, driven by two remarkable technologies: optical character recognition (OCR) and ChatGPT (generative pre-trained transformer). The combined potential of these technologies is enormous and offers assistance in numerous fields.

That is why we in Ximilar have recently developed an OCR system, integrated it with ChatGPT and made it available via API. It is one of the first publicly available services combining OCR software and the GPT model, supporting several alphabets and languages. In this article, I will provide an overview of what OCR and ChatGPT are, how they work, and – more importantly – how anyone can benefit from their combination.

What is Optical Character Recognition (OCR)?

OCR (Optical Character Recognition) is a technology that can quickly scan documents or images and extract text data from them. OCR engines are powered by artificial intelligence & machine learning. They use object detection, pattern recognition and feature extraction.

An OCR software can actually read not only printed but also handwritten text in an image or a document and provide you with extracted text information in a file format of your choosing.

How Optical Character Recognition Works?

When an OCR engine is provided with an image, it first detects the position of the text. Then, it uses AI model for reading individual characters to find out what the text in the scanned document says (text recognition).

This way, OCR tools can provide accurate information from virtually any kind of image file or document type. To name a few examples: PDF files containing camera images, scanned documents (e.g., legal documents), old printed documents such as historical newspapers, or even license plates.

A few examples of OCR: transcribing books to electronic form, reading invoices, passports, IDs, and landmarks.

Most OCR tools are optimized for specific languages and alphabets. We can tune these tools in many ways. For example, to automate the reading of invoices, receipts, or contracts. They can also specialize in handwritten or printed paper documents.

The basic outputs from OCR tools are usually the extracted texts and their locations in the image. The data extracted with these tools can then serve various purposes, depending on your needs. From uploading the extracted text to simple Word documents to turning the recognized text to speech format for visually impaired users.

OCR programs can also do a layout analysis for transforming text into a table. Or they can integrate natural language processing (NLP) for further text analysis and extraction of named entities (NER). For example, identifying numbers, famous people or locations in the text, like ‘Albert Einstein’ or ‘Eiffel Tower’.

Technologies Related to OCR

You can also meet the term optical word recognition (OWR). This technology is not as widely used as the optical character recognition software. It involves the recognition and extraction of individual words or groups of words from an image.

There is also optical mark recognition (OMR). This technology can detect and interpret marks made on paper or other media. It can work together with OCR technology, for instance, to process and grade tests or surveys.

And last but not least, there is intelligent character recognition (ICR). It is a specific OCR optimised for the extraction of handwritten text from an image. All these advanced methods share some underlying principles.

What are GPT and ChatGPT?

Generative pre-trained transformer (GPT), is an AI text model that is able to generate textual outputs based on input (prompt). GPT models are large language models (LLMs) powered by deep learning and relying on neural networks. They are incredibly powerful tools and can do content creation (e.g., writing paragraphs of blog posts), proofreading and error fixing, explaining concepts & ideas, and much more.

The Impact of ChatGPT

ChatGPT introduced by OpenAI and Microsoft is an extension of the GPT model, which is further optimized for conversations. It has had a great impact on how we search, work with and process data.

GPT models are trained on huge amounts of textual data. So they have better knowledge than an average human being about many topics. In my case, ChatGPT has definitely better English writing & grammar skills than me. Here’s an example of ChatGPT explaining quantum computing:

ChatGPT model explaining quantum computing. [source: OpenAI]

It is no overstatement to say that the introduction of ChatGPT revolutionized data processing, analysis, search, and retrieval.

How Can OCR & GPT Be Combined For Smart Text Extraction

The combination of OCR with GPT models enables us to use this technology to its full potential. GPT can understand, analyze and edit textual inputs. That is why it is ideal for post-processing of the raw text data extracted from images with OCR technology. You can give the text to the GPT and ask simple questions such as “What are the items on the invoice and what is the invoice price?” and get an answer with the exact structure you need.

This was a very hard problem just a year ago, and a lot of companies were trying to build intelligent document-reading systems, investing millions of dollars in them. The large language models are really game changers and major time savers. It is great that they can be combined with other tools such as OCR and integrated into visual AI systems.

It can help us with many things, including extraction of essential information from images and putting them into text documents or JSON. And in the future, it can revolutionize search engines, and streamline automated text translation or entire workflows of document processing and archiving.

Examples of OCR Software & ChatGPT Working Together

So, now that we can combine computer vision and advanced natural language processing, let’s take a look at how we can use this technology to our advantage.

Reading, Processing and Mining Invoices From PDFs

One of the typical examples of OCR software is reading the data from invoices, receipts, or contracts from image-only PDFs (or other documents). Imagine a part of invoices and receipts your accounting department accepts are physical printed documents. You could scan the document, and instead of opening it in Adobe Acrobat and doing manual data entry (which is still a standard procedure in many accounting departments today), you would let the automated OCR system handle the rest.

Scanned documents can be automatically sent to the API from both computers and mobile phones. The visual AI needs only a few hundred milliseconds to process an image. Then you will get textual data with the desired structure in JSON or another format. You can easily integrate such technology into accounting systems and internal infrastructures to streamline invoice processing, payments or SKU numbers monitoring.

Receipt analysis via Ximilar OCR and OpenAI ChatGPT.

Trading Card Identifying & Reading Powered by AI

In recent years, the collector community for trading cards has grown significantly. This has been accompanied by the emergence of specialized collector websites, comparison platforms, and community forums. And with the increasing number of both cards and their collectors, there has been a parallel demand for automating the recognition and cataloguing collectibles from images.

Ximilar has been developing AI-powered solutions for some of the biggest collector websites on the market. And adding an OCR system was an ideal solution for data extraction from both cards and their graded slabs.

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

We developed an OCR system that extracts all text characters from both the card and its slab in the image. Then GPT processes these texts and provides structured information. For instance, the name of the player, the card, its grade and name of grading company, or labels from PSA.

Extracting text from the trading card via OCR and then using GPT prompt to get relevant information.

Needless to say, we are pretty big fans of collectible cards ourselves. So we’ve been enjoying working on AI not only for sports cards but also for trading card games. We recently developed several solutions tuned specifically for the most popular trading card games such as Pokémon, Magic the Gathering or YuGiOh! and have been adding new features and games constantly. Do you like the idea of trading card recognition automation? See how it works in our public demo.

Try demo

How Can I Use the OCR & GPT API On My Images or PDFs?

Our OCR software is publicly available via an online REST API. This is how you can use it:

Log into Ximilar App
- Get your free API TOKEN to connect to API – Once you sign up to Ximilar App, you will get a free API token, which allows your authentication. The API documentation is here to help you with the basic setup. You can connect it with any programming language and any platform like iOS or Android. We provide a simple Python SDK for calling the API.
- You can also try the service directly in the App under Computer Vision Platform.
For simple text extraction from your image, call the endpoint read.
```
https://api.ximilar.com/ocr/v2/read
```
For text extraction from an image and its post-processing with GPT, use the endpoint read_gpt. To get the results in a deserved structure, you will need to specify the prompt query along with your input images in the API request, and the system will return the results immediately.
```
https://api.ximilar.com/ocr/v2/read_gpt
```
The output is JSON with an ‘_ocr’ field. This dictionary contains texts that represent a list of polygons that encapsulate detected words and sentences in images. The full_text field contains all strings concatenated together. The API is returning also the language name (“lang_name”) and language code (“lang”; ISO 639-1). Here is an example:
```
{
  "_url": "__URL_PATH_TO_IMAGE__
  "_ocr": {
     "texts": [
       {
          "polygon": [[53.0,76.0],[116.0,76.0],[116.0,94.0],[53.0,94.0]],
          "text": "MICKEY MANTLE",
          "prob": 0.9978849291801453
       },
       ...
     ],
     "full_text": "MICKEY MANTLE 1st Base Yankees",
     "lang_name": "english",
     "lang_code": "en
  }
}
```
Our OCR engine supports several alphabets (Latin, Chinese, Korean, Japanese and Cyrillic) and languages (English, German, Chinese, …).

Integrate the Combination of OCR and ChatGPT In Your System

All our solutions, including the combination of OCR & GPT, are available via API. Therefore, they can be easily integrated into your system, website, app, or infrastructure.

Here are some examples of up-to-date solutions that can easily be built on our platform and automate your workflows:

Detection, recognition & text extraction system – You can let the users of your website or app upload images of collectibles and get relevant information about them immediately. Once they take an image of the item, our system detects its position (and can mark it with a bounding box). Then, it recognizes their features (e.g., name of the card, collectible coin or comic book), extracts texts with OCR and you will get text data for your website (e.g., in a table format).
Card grade reading system – If your users upload images of graded cards or other collectibles, our system can detect everything including the grades and labels on the slabs in a matter of milliseconds.
Comic book recognition & search engine – You can extract all texts from each image of a comic book and automatically match it to your database for cataloguing.
Giving your collection or database of collectibles order – Imagine you have a website featuring a rich collection of collectible items, getting images from various sources and comparing their prices. The metadata can be quite inconsistent amongst source websites, or be absent in the case of user-generated content. AI can recognize, match, find and extract information from images based purely on computer vision and independent of any kind of metadata.

Let’s Build Your Solution

If you would like to learn more about how you can automate the workflows in your company, I recommend browsing our page All Solutions, where we briefly explained each solution. You can also check out pages such as Visual AI for Collectibles, or contact us right away to discuss your unique use case. If you’d like to learn more about how we work on customer projects step by step, go to How it Works.

Ximilar’s computer vision platform enables you to develop AI-powered systems for image recognition, visual quality control, and more without knowledge of coding or machine learning. You can combine them as you wish and upgrade any of them anytime.

Don’t forget to visit the free public demo to see how the basic services work. Your custom solution can be assembled from many individual services. This modular structure enables us to upgrade or change any piece anytime, while you save your money and time.

How do custom projects work?

The post When OCR Meets ChatGPT AI in One API appeared first on Ximilar: Visual AI for Business.

Predict Values From Images With Image Regression

Zuzana Raidová — Wed, 22 Mar 2023 15:03:45 +0000

We are excited to introduce the latest addition to Ximilar’s Computer Vision Platform. Our platform is a great tool for building image classification systems, and now it also includes image regression models. They enable you to extract values from images with accuracy and efficiency and save your labor costs.

Let’s take a look at what image regression is and how it works, including examples of the most common applications. More importantly, I will tell you how you can train your own regression system on a no-code computer vision platform. As more and more customers seek to extract information from pictures, this new feature is sure to provide Ximilar’s customers with the tools they need to stay ahead of the curve in today’s highly competitive AI-driven market.

What is the Difference Between Image Categorization and Regression?

Image recognition models are ideal for the recognition of images or objects in them, their categorization and tagging (labelling). Let’s say you want to recognize different types of car tyres or their patterns. In this case, categorization and tagging models would be suitable for assigning discrete features to images. However, if you want to predict any continuous value from a certain range, such as the level of tyre wear, image regression is the preferred approach.

Image regression is an advanced machine-learning technique that can predict continuous values within a specific range. Whenever you need to rate or evaluate a collection of images, an image regression system can be incredibly useful.

For instance, you can define a range of values, such as 0 to 5, where 0 is the worst and 5 is the best, and train an image regression task to predict the appropriate rating for given products. Such predictive systems are ideal for assigning values to several specific features within images. In this case, the system would provide you with highly accurate insights into the wear and tear of a particular tyre.

Predicting the level of tires worn out from the image is a use case for an image regression task, while a categorization task can recognize the pattern of the tyre.

How to Train Image Regression With a Computer Vision Platform?

Simply log in to Ximilar App and go to Categorization & Tagging. Upload your training pictures and under Tasks, click on Create a new task and create a Regression task.

Creating an image regression task in Ximilar App.

You can train regression tasks and test them via the same front end or with API. You can develop an AI prediction task for your photos with just a few clicks, without any coding or any knowledge of machine learning.

This way, you can create an automatic grading system able to analyze an image and provide a numerical output in the defined range.

Use the Same Training Data For All Your Image Classification Tasks

Both image recognition and image regression methods fall under the image classification techniques. That is why the whole process of working with regression is very similar to categorization & tagging models.

Working with image regression model on Ximilar computer vision platform.

Both technologies can work with the same datasets (training images), and inputs of various image sizes and types. In both cases, you can simply upload your data set to the platform, and after creating a task, label the pictures with appropriate continuous values, and then click on the Train button.

Apart from a machine learning platform, we offer a number of AI solutions that are field-tested and ready to use. Check out our public demos to see them in action.

If you would like to build your first image classification system on a no-code machine learning platform, I recommend checking out the article How to Build Your Own Image Recognition API. We defined the basic terms in the article How to Train Custom Image Classifier in 5 Minutes. We also made a basic video tutorial:

Tutorial: train your own image recognition model with Ximilar platform.

Neural Network: The Technology Behind Predicting Range Values on Images

The most simple technique for predicting float values is linear regression. This can be further extended to polynomial regression. These two statistical techniques are working great on tabular input data. However, when it comes to predicting numbers from images, a more advanced approach is required. That’s where neural networks come in. Mathematically said, neural network “f” can be trained to predict value “y” on picture “x”, or “y = f(x)”.

Neural networks can be thought of as approximations of functions that we aim to identify through the optimization on training data. The most commonly used NNs for image-based predictions are Convolutional Neural Networks (CNNs), visual transformers (VisT), or a combination of both. These powerful tools analyze pictures pixel by pixel, and learn relevant features and patterns that are essential for solving the problem at hand.

CNNs are particularly effective in picture analysis tasks. They are able to detect features at different spatial scales and orientations. Meanwhile, VisTs have been gaining popularity due to their ability to learn visual features without being constrained by spatial invariance. When used together, these techniques can provide a comprehensive approach to image-based predictions. We can use them to extract the most relevant information from images.

What Are the Most Common Applications of Value Regression From Images?

Estimating Age From Photos

Probably the most widely known use case of image regression by the public is age prediction. You can come across them on social media platforms and mobile apps, such as Facebook, Instagram, Snapchat, or Face App. They apply deep learning algorithms to predict a user’s age based on their facial features and other details.

While image recognition provides information on the object or person in the image, the regression system tells us a specific value – in this case, the person’s age.

Needless to say, these plugins are not always correct and can sometimes produce biased results. Despite this limitation, various image regression models are gaining popularity on various social sites and in apps.

Ximilar already provides a face-detection solution. Models such as age prediction can be easily trained and deployed on our platform and integrated into your system.

Value Prediction and Rating of Real Estate Photos

Pictures play an essential part on real estate sites. When people are looking for a new home or investment, they are navigating through the feed mainly by visual features. With image regression, you are able to predict the state, quality, price, and overall rating of real estate from photos. This can help with both searching and evaluating real estate.

Predicting rating, and price (regression) for household images with image regression.

Custom recognition models are also great for the recognition & categorization of the features present in real estate photos. For example, you can determine whether a room is furnished, what type of room it is, and categorize the windows and floors based on their design.

Additionally, a regression can determine the quality or state of floors or walls, as well as rank the overall visual aesthetics of households. You can store all of this information in your database. Your users can then use such data to search for real estate that meets specific criteria.

Image classification systems such as image recognition and value regression are ideal for real estate ranking. Your visitors can search the database with the extracted data.

Determining the Degree of Wear and Tear With AI

Visual AI is increasingly being used to estimate the condition of products in photos. While recognition systems can detect individual tears and surface defects, regression systems can estimate the overall degree of wear and tear of things.

A good example of an industry that has seen significant adoption of such technology is the insurance industry. For example, startups-like Lemonade Inc, or Root use AI when paying the insurance.

With custom image recognition and regression methods, it is now possible to automate the process of insurance claims. For instance, a visual AI system can indicate the seriousness of damage to cars after accidents or assess the wear and tear of various parts such as suspension, tires, or gearboxes. The same goes with other types of insurance, including households, appliances, or even collectible & antique items.

Our platform is commonly utilized to develop recognition and detection systems for visual quality control & defect detection. Read more in the article Visual AI Takes Quality Control to a New Level.

Automatic Grading of Antique & Collectible Items Such as Sports Cards

Apart from car insurance and damage inspection, recognition and regression are great for all types of grading and sorting systems, for instance on price comparators and marketplaces of collectible and antique items. Deep learning is ideal for the automatic visual grading of collector items such as comic books and trading cards.

By leveraging visual AI technology, companies can streamline their processes, reduce manual labor significantly, cut costs, and enhance the accuracy and reliability of their assessments, leading to greater customer satisfaction.

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

Food Quality Estimation With AI

Biotech, Med Tech, and Industry 4.0 also have a lot of applications for regression models. For example, they can estimate the approximate level of fruit & vegetable ripeness or freshness from a simple camera image.

The grading of vegetables by an image regression model.

For instance, this Japanese farmer is using deep learning for cucumber quality checks. Looking for quality control or estimation of size and other parameters of olives, fruits, or meat? You can easily create a system tailored to these use cases without coding on the Ximilar platform.

Build Custom Evaluation & Grading Systems With Ximilar

Ximilar provides a no-code visual AI platform accessible via App & API. You can log in and train your own visual AI without the need to know how to code or have expertise in deep learning techniques. It will take you just a few minutes to build a powerful AI model. Don’t hesitate to test it for free and let us know what you think!

Our developers and annotators are also able to build custom recognition and regression systems from scratch. We can help you with the training of the custom task and then with the deployment in production. Both custom and ready-to-use solutions can be used via API or even deployed offline.

How do custom projects work?

The post Predict Values From Images With Image Regression appeared first on Ximilar: Visual AI for Business.

Pokémon TCG Search Engine: Use AI to Catch Them All

Michal Lukáč — Tue, 11 Oct 2022 12:20:00 +0000

Have you played any trading card games? As an elementary school student, I remember spending hundreds of hours playing Lord of the Rings TCG with my friend. Back then, LOTR was in the cinemas, and the game was simply fantastic, with beautiful pictures from movies. I still remember my deck, played with a combination of Ents/Gondor and Nazguls.

Other people in our office spent their youth playing Magic The Gathering (with beautiful artworks), or collecting sports cards with their favorite athletes. In my country, basketball cards and ice hockey cards were really popular. Cards are still loved, played, collected, and traded by geeks, collectors, and sports fans across the world! Their market is growing, and so is the need for automation of image processing on websites and apps for collectors. Right now, cards can be seen even as a great investment.

Where can you use visual AI for cards?

Trading card games (トレーディングカード) can consist of tens of thousands of cards. In principle, building a basic image classifier based solely on image recognition leads to low precision and is simply not enough for more complicated problems.

However, we are able to build a complex similarity system that can recognize, categorize, and find similar cards by a picture. Once trained properly, it can deal with enormous databases of images it never encountered before. With this system, you can find all the information, such as the year of release, card title, exact value, card set, or whether it already is in someone’s collection, with just a smartphone image of the card.

Tip: Check out our Computer Vision Platform to learn about how basic image recognition systems work. If you are not sure how to develop your card search system, just contact us and we will help you.

Collectibles are a big business and some cards are really expensive nowadays. Who knows, maybe you have the card of Charizard or Kobe Bryant hidden in your old box in the attic. We can develop a system for you that can automatically analyze the bulk of trading cards sent from your customers or integrate it into your mobile/smartphone app.

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

What can visual search do for the trading cards market?

In the last year, we have been building a universal system able to train visual models with numerous applications in image search engines. We already offer visual search services for photo search. But, they are optimized mostly for general and fashion images. This system can be tuned to trading cards, coins, furniture & home decor, arts, and real estate, … there are infinite use cases.

In the last decades, we have all witnessed the growth of the TCG community. However, technologies based on artificial intelligence have not yet been used in this market. Plus, even though the first system for scanning trading cards was released by ebay.com, it was not made available for small shops as an API. And since trading card games and visual AI are a perfect match, we are going to change it – with a card image search.

Tip: Check out Visual Product Search to learn more about visual search applications.

Which TCG cards could visual AI help with?

An image search engine is a great approach when the number of classes for the image classification is high (above 1,000+). With TCGs, each card represents a unique class. A convolutional neural network (CNN) trained as a classifier can have poor results when working with a larger number of classes.

Pokémon TCG contains more than 10,000 cards (classes), Magic the Gathering (MTG) over 50.000, and the same goes for basketball or any other sports cards. So basically, we can build a visual search system for both:

Trading card games (Magic the Gathering, Lord of the Rings, Pokémon, Yu-Gi-Oh!, One Piece, Warhammer, and so on)
Collectible sports cards (like Ice Hockey, Football, Soccer, Baseball, Basketball, UFC, and more)

Pokémon, Magic The Gathering, LOTR, Ice Hockey, and Basketball cards.
Yes, we are big fans of all these things

A visual search/recognition technology is starting to be used on E-bay when listing trading and sports cards for sale. However, this is only available in the e-bay app on smartphones. The app has a built-in scanning tool for cards and can find the average price with additional info.

Our service for card image search can be integrated into your website or application. And you can simply connect via API through a smartphone, computer, or sorting machine to find exact cards by photo, saving a lot of time and improving the user experience!

Our Pokémon Card Image Search

We’ve been recently training an AI (neural network) model for Pokémon trading cards, Yugioh! and Magic The Gathering. Why these? Pokémon is the most played TCG in the world, the game has simple rules and an enormous fan base. Very popular are also MTG and Yugioh! Some cards are really expensive, but more importantly, they are traded heavily!

With this model, we built a reverse search for finding the exact Pokémon card, MTG and Yugioh! cards, which achieved 94%+ accuracy (i.e. exact image match). And we are still talking about a prototype in beta version that can be improved to almost 100 %. This search system can return you the edition of the card, language, name of the card, year of release and much more.

If you would like to try the system on these three trading card games, then the endpoint for card identification (/v2/tcg_id) from the Collectibles Recognition service is the right choice for you. If you need to tune it on your image collections or have any other games or cards (sports) then just contact us and we can build a similar service for you.

See DEMO

Automatic grading and inspection of cards with AI

A lot of companies are grading sports & trading cards manually. Our visual AI can be trained to detect corner types, scratches, surface wear, light staining, creases, focus, and borders. The Image recognition models are able to identify marks, wrong cut, lopsided centering, print defects and other special attributes.

For example, PSA is a company that has developed its own grading standards for automatic card grading (MINT). With our platform and team, you can automatize the entire workflow of grading with just one photo. We provide several solutions for computing card grades and card condition.

PSA graded baseball card. Automatic grading is possible with machine learning.

Customized solution for visual search

With the new custom similarity service, we are able to create a custom solution for trading card image search in a matter of weeks. The process for developing it is quite simple:

We will schedule a call and talk about your goals. We will agree on how we will obtain the training data that are necessary to train your custom machine-learning model for the search engine.
Our machine-learning specialists will assemble a testable image search collection and train a custom machine-learning model for you in a matter of weeks.
After meeting all the requirements of PoC, we will deploy the system to production, and you can connect to it via Rest API.

Image Recognition of Collectibles

Machine learning models bring endless possibilities not only to pop culture geeks and collectors, but to all fields and industries. From personalized recommendations in custom fashion search engines to automatic detection of slight differences in surface materials, the visual AI gets better and smarter every day, making routine tasks a matter of milliseconds. That is one of the reasons why it is an unlimited resource of curiosity, challenges, and joy for us, including being geeks – professionally :).

Ximilar is currently releasing on a ready-to-use computer vision service able to recognize collectibles such as TCG cards, coins, banknotes or post stamps, detect their features and categorize them. Let us know if you’d like to implement it on your website!

If you are interested in a customized AI solution for collector items write us an email and we will get back to you as soon as possible. If you would like to identify cards with our Collectibles Recognition service just sign up via app.ximilar.com.

Image Recognition of Collectibles

The post Pokémon TCG Search Engine: Use AI to Catch Them All appeared first on Ximilar: Visual AI for Business.

Explainable AI: What is My Image Recognition Model Looking At?

Zuzana Raidová — Tue, 07 Dec 2021 14:16:20 +0000

There are many challenges in machine learning, and developing a good model is one of them. Even though neural networks are very powerful, they have a great weakness. Their complexity makes it hard to understand how they reach their decisions. This might be a problem when you want to move from development to production, and it might eventually cause your whole project to fail. But how can you measure the success of a machine learning model? The answer is not easy. In our opinion, the model must excel in a production environment and should work reliably in both common and uncommon situations.

However, even when the results in production are good, there are areas, where we can’t simply accept black box decisions without being sure, how the AI made them. These areas are typically medicine and biotech or any other field where there is no place for errors. We need to make sure that both output and the way our model reached its decision make sense – we need explainable AI. For these reasons, we introduced a new feature to our Image Recognition service called Explain.

Training Image Recognition

Image Recognition is a Visual AI service enabling you to train custom models to recognize images or objects in them. In Ximilar App, you can use Categorization & Tagging and Object Detection, which can be combined with Flows. For example, the first task will detect all the human models in the image and the categorization & tagging tasks will categorize and tag their clothes and accessories.

Image recognition is a very powerful technology, bringing automation to many industries. It requires well-trained models, and, in the case of object detection, precise data annotation. If you are not familiar with using image recognition on our platform, please try to set up your own classifier first.

These resources should be helpful in the beginning:

Check the Custom Image Recognition page
Read The Basic Rules for Image Recognition Models Training
Read Best Practices in Image Recognition Training
Watch our YouTube tutorial on how to set up the image recognition task
Read how to combine and chain your models with Flows

From model-centric to data-centric with explainable AI

Explaining which areas are important for the leaf disease recognition model when predicting a label called “canker”.

When you want a model which performs great in a production setting and has high accuracy, you need to focus on your training data first. Consistency of labelling, cleaning datasets from unnecessary samples/labels, and adding feature-rich samples that are missing is much more important than the newest architecture of the neural network. Andrew Ng, an entrepreneur and professor at Stanford, is also promoting this approach to building machine learning models.

The Explain feature in our App tells you:

which parts of images (features and pixels) are important for predicting specific labels
for which images the model will probably predict the wrong results
which samples should be added to your training dataset to improve performance

Simple Example: T-shirt or Not?

Let’s look at this simple example of how explainable AI can be useful. Let’s say we have a task containing two categories – t-shirts and shoes. For a start, we have 20 images in each category. It is definitely not enough for production, but it is enough if you want to experiment and learn.

This neural network has two labels: shoes and t-shirt.

After playing with the advanced options and short training, the result seems really promising:

Using Explain on a Training Image

But did the model actually learn what we wanted? To check, what the neural network find important when categorizing our images, we will apply two different methods with the tool Explain:

Grad-CAM (first published in 2016) – this method is very fast, but the results are not very precise
Blur Integrated Gradients (published in 2020) smoothed with SmoothGrad – this method provides much more details, but at the cost of computational time

Grad-Cam result of Explain feature. As you can see, the model is looking mostly at the head/face.

Blur-Integrated Gradients results, the most important features are head/face, similar to what grad-cam is telling us.

In this case, both methods clearly demonstrate the problem of our model. The focus is not on the t-shirt itself, but on the head of the person wearing it. In the end, it was easier for the learning algorithm and the neural network to distinguish between the two categories using this feature instead of focusing on the t-shirt. If we look at the training data for label t-shirt, we can see that all pictures include a person with a visible face.

Data for T-shirt label for the image recognition task. This small dataset contains only photos with visible faces, which can be a problem.

Explainability After Adding New Data

The solution might be adding more varied training data and introducing images without a person. Generally, it’s a good approach to start with a small dataset and over time increase it to a bigger one. Adding visually broad images helps model with overfitting on wrong features. So we added more photos to the label and trained the model again. Let’s see what the results look like with our new version of the model:

After retraining the model on new data, we can see the improvement for what features the neural network looking for.

The Grad-CAM result on the left is not very convincing in this case. The image on the right shows the result of Blur Integrated Gradients. Here you can see, how the focus moved from the head to the t-shirt. It seems like the head still plays some part, but there is much less focus on it.

Both methods for explainable AI have their drawbacks, and sometimes we have to try more pictures to get a better understanding of model behaviour. We also need to mention one important point. Due to the way the algorithm works, it tends to prefer edges, which is clearly visible in the examples.

Summary

The Explainability and Interpretability of Neural Networks is a big research topic, and we are looking forward to adopting and integrating more techniques into our SaaS AI solution. AI Explainability that we showed you is only one tool amongst many towards data-centric AI.

If you have any troubles, do not hesitate to contact us. The machine learning specialists of Ximilar have vast experience with different kinds of problems, and are always happy to help you with yours.

The post Explainable AI: What is My Image Recognition Model Looking At? appeared first on Ximilar: Visual AI for Business.

How to deploy object detection on Nvidia Jetson Nano

Michal Lukáč — Mon, 18 Oct 2021 12:13:16 +0000

At the beginning of summer, we received a request for a custom project for a camera system in a factory located in Africa. The project was about detecting, counting, and visual quality control of the items on the conveyor belts in a factory with the help of visual AI. So we developed a complex system with neural networks on a small computer called Jetson Nano. If you are curious about how we did it, this article is for you. And if you need help with building similar solutions for your factory, our team and tools are here for you.

What is NVIDIA Jetson Nano?

There were two reasons why using our API was not an option. First, the factory has unstable internet connectivity. Also, the entire solution needs to run in real time. So we chose to experiment with embedded hardware that can be deployed in such an environment, and we are very glad that we found Nvidia Jetson Nano.

[Source]

Jetson Nano is an amazing small computer (embedded or edge device) built for AI. It allows you to do machine learning in a very efficient way with low-power consumption (about 5 watts). It can be a part of IoT (Internet of Things) systems, running on Ubuntu & Linux, and is suitable for simple robotics or computer vision projects in factories. However, if you know that you will need to detect, recognize and track tens of different labels, choose the higher version of Jetson embedded hardware, such as Xavier. It is a much faster device than Nano and can solve more complex problems.

What is Jetson Nano good for?

Jetson is great if:

You need a real-time analysis
Your problem can be solved with one or two simple models
You need a budget solution & be cost-effective when running the system
You want to connect it to a static camera – for example, monitoring an assembly line
The system cannot be connected to the internet – for example, because your factory is in a remote place or for security reasons

The biggest challenges in Africa & South Africa remain connectivity and accessibility. AI systems that can run in house and offline can have great potential in such environments.
Deloitte: Industry 4.0 – Is Africa ready for digital transformation?

Object Detection with Jetson Nano

If you need real-time object detection processing, use the Yolo-V4-Tiny model proposed in this repository AlexeyAB/darknet. And other more powerful architectures are available as well. Here is a table of what FPS you can expect when using Yolo-V4-Tiny on Jetson:

Architecture	mAP @ 0.5	FPS
yolov4-tiny-288	0.344	36.6
yolov4-tiny-416	0.387	25.5
yolov4-288	0.591	7.93

Source: Github

After the model’s training is completed, the next step is the conversion of the weights to the TensorRT runtime. TensorRT runtimes make a substantial difference in speed performance on Jetson Nano. So train the model with AlexeyAB/darknet and then convert it with tensorrt_demos repository. The conversion has multiple steps because you first convert darknet Yolo weights to ONNX and then convert to TensorRT.

There is always a trade-off between accuracy and speed. If you do not require a fast model, we also have a good experience with Centernet. Centernet can achieve a really nice mAP with precise boxes. If you run models with TensorFlow or PyTorch backends, then the speed is slower than Yolo models in our experience. Luckily, we can train both architectures and export them in a suitable format for Nvidia Jetson Nano.

Image Recognition on Jetson Nano

For any image categorization problem, I would recommend using simple architecture as MobileNetV2. You can select for example the depth multiplier for mobilenet of 0.35 and image resolution 128×128 pixels. In this way, you can achieve great performance both in speed and precision.

We recommend using TFLITE backend when deploying the recognition model on Jetson Nano. So train the model with the TensorFlow framework and then convert it to TFLITE. You can train recognition models with our platform without any coding for free. Just visit Ximilar App, where you can develop powerful image recognition models and download them for offline usage on Jetson Nano.

A simple Object Detection camera system with the counting of products can be deployed offline in your factory with Jetson Nano.

Recommended camera and utilities

Jetson Nano is simple but powerful hardware. However, it is not as powerful as your laptop or desktop computer. That’s why analyzing 4k images on Jetson will be very slow. I would recommend using max 1080p camera resolution. We used a camera by Raspberry PI, which works very well on Jetson and installation is easy!

I should mention that with Jetson Nano, you can come across some temperature issues. Jetson is normally shipped with a passive cooling system. However, if this small piece of hardware should be in the factory, and run stable for 24 hours, we recommend using an active cooling system like this one. Don’t forget to run the next command so your fan on Jetson starts working:

sudo jetson_clocks --fan

Installation steps & tips for development

When working with Jetson Nano, I recommend following guidelines by Nvidia, for example here is how to install the latest TensorFlow version. There is a great tool called jtop, which visualizes hardware stats as GPU frequency, temperature, memory size, and much more:

jtop tool can help you monitor statistics on Nvidia Jetson Nano.

Remember, the Jetson has shared memory with GPU. You can easily run out of 4 GB when running the model and some programs alongside. If you want to save more than 0.5 GB of memory on Jetson, then run the Ubuntu on LXDE desktop environment/interface. The LXDE is more lightweight than the default Ubuntu environment. To increase memory, you can also create a swap file. But be aware that if your project requires a lot of memory, it can eventually destroy your microSD card. More great tips and hacks can be found on JetsonHacks page.

For improvement of the speed of Jetson, you can also try these two commands, which will set the maximum power input and frequency:

sudo nvpmodel -m0
sudo jetson_clocks

When using the latest image for Jetson, be sure that you are working with the right OpenCV versions of the library. For example, some older tracking algorithms like MOSSE or KCF from OpenCV require a specific version. For some tracking solutions, I recommend looking on PyImageSearch website.

Developing on Jetson Nano

The experience of programming challenging projects, exploring new gadgets, and helping our customers is something that deeply satisfies us. We are looking forward to trying other hardware for machine learning such as Coral from Google, Raspberry Pi, or Intel Movidius for Industry 4.0 projects.

Most of the time, we are developing a machine learning API for large e-commerce sites. We are really glad that our platform can also help us build machine learning models on devices running in distant parts of the world with no internet connectivity. I think that there are many more opportunities for similar projects in the future.

The post How to deploy object detection on Nvidia Jetson Nano appeared first on Ximilar: Visual AI for Business.

Flows – The Game Changer for Next-Generation AI Systems

Zuzana Raidová — Wed, 01 Sep 2021 15:25:28 +0000

We have spent thousands of man-hours on this challenging subject. Gallons of coffee later, we introduced a service that might change how you work with data in Machine Learning & AI. We named this solution Flows. It enables simple and intuitive chaining and combining of machine learning models. This simple idea speeds up the workflow of setting up complex computer vision systems and brings unseen scalability to machine learning solutions.

We are here to offer a lot more than just training models, as common AI companies do. Our purpose is not to develop AGI (artificial general intelligence), which is going to take over the world, but easy-to-use AI solutions, that can revolutionize many areas of both business and daily life. So, let’s dive into the possibilities of flows in this 2021 update of one of our most-viewed articles.

Flows: Visual AI Setup Cannot Get Much Easier

In general, at our platform, you can break your machine learning problem down into smaller, separate parts (recognition, detection, and other machine learning models called tasks) and then easily chain & combine these tasks with Flows to achieve the full complexity and hierarchical classification of a visual AI solution.

A typical simple use case is conditional image processing. For instance, the first recognition task filters out non-valid images, then the next one decides a category of the image and, according to the result, other tasks recognize specific features for a given category.

Simple use of machine learning models combination in a flow

Flows allow your team to review and change datasets of all complexity levels fast and without any trouble. It doesn’t matter whether your model uses three simple categories (e.g. cats, dogs, and guinea pigs) or works with an enormous and complex hierarchy with exceptions, special conditions, and interdependencies.

It also enables you to review the whole dataset structure, analyze, and, if necessary, change its logic due to modularity. With a few clicks, you can add new labels or models (tasks), change their chaining, change the names of the output fields, etc. Neat? More than that!

Think of Flows as Zapier or IFTTT in AI. With flows, you simply connect machine learning models, and review the structure anytime you need.

Define a Flow With a Few Clicks

Let’s assume we are building a real estate website, and we want to automatically recognize different features that we can see in the photos. Different kinds of apartments and houses have various recognizable features. Here is how we can define this workflow using recognition flows (we trained each model with a custom image recognition service):

An example of real estate classifier made of machine learning models combined in a flow

The image recognition models are chained in a “main” flow called the branch selector. The branch selector saves the result in the same way as a recognition task node and also chooses an action based on the result of this task. First, we let the top category task recognize the type of estate (Apartment vs. Outdoor house). If it is an apartment, we can see that two subsequent tasks are “Apartment features” and “Room type”.

A flow can also call other flows, so-called nested flows, and delegate part of the work to them. If the image is an outdoor house, we continue processing by another nested flow called “Outdoor house”. In this flow, we can see another branching according to the task that recognizes “House type”. Different tasks are called for individual categories (Bungalow, Cottage, etc.):

An example use of nested flows – the main flow calls other nested flows to process images based on their category

Flow Elements We Used

So far, we have used three elements:

A recognition task, that simply calls a given task and saves the result into an output field with a specified name. No other logic is involved.
A branch selector, on the other hand, saves the result in the same way as a recognition task node, but then it chooses an action based on the result of this task.
Nested flow, another flow of tasks, that the “main” flow (branch selector) called.

Implicitly, there is also a List element present in some branches. We do not need to create it, because as soon as we add two or more elements to a single branch, a list generates in the background. All nodes in a list are normally executed in parallel, but you can also set sequential execution. In this case, the reordering button will appear.

Branch Selector – Advanced Settings

The branch selector is a powerful element. It’s worthwhile to explore what it can do. Let’s go through the most important options. In a single branch, by default, only actions (tag or category) with the highest relevance will be performed, provided the relevance (the probability outputted by the model) is above 50 %. But we can change this in advanced settings. We can specify the threshold value and also enable parallel execution of multiple branches!

The advanced settings of a branch selector, enabling to skip a task of a flow

You can specify the format of the results. Flat JSON means that results from all branches will be saved on the same level as any previous outcomes. And if there are two same output names in multiple branches, they can be overwritten. The parallel execution guarantees neither order nor results. You can prevent this from happening by selecting nested JSON, which will save the results from each branch under a separate key, based on the branch name (that is the tag/category name).

If some data (output_field) are present in the incoming request, we can skip calling the branch selector processing. You can define this in If Output Field Exists. This way we can save credits and also improve the precision of the system. I will show you how useful this behaviour can be in the next paragraphs. To learn about the advanced options of training, check this article.

An Example: Fashion Detection With Tags

We have just created a flow to tag simple and basic pictures. That is cool. But can we really use it in real-life applications? Probably not. The reason is, in most pictures, there is usually more than one clothing item. So how are we going to automate the tagging of more complex pictures? The answer is simple: we can integrate object detection into flows and combine it with recognition & tagging models!

Example of Fashion Tagging combined with Object Detection in Ximilar App

The flow structure then exactly mirrors the rich product taxonomy. Each image goes through a taxonomy tree in order to get proper tags. This is our “top classifier” – a flow that can tell one of our seven top categories of a fashion product image, which will determine how the image will be further classified. For instance, if it is a “Clothing” product, the image continues to “Clothing tagging” flow.

A “top classifier” – a flow that can tell one of our seven top categories of a fashion product image.

Similar to categorization or tagging, there are two basic nodes for object detection: the Detection Task for simple execution of a given task and Object Selector, which enables the processing of the detected objects.

Object Selector will call the object detection task. The detected objects will be extracted out of the image and passed further to any of the available nodes. Yes, any of them! Even another Object Selector, if, for example, you need to first detect people and then detect clothes on each person separately.

Object Selector – Advanced Settings

Object Selector behavior can be customized in similar ways as a Branch Selector. In addition to the Probability Threshold, there is also an Area Threshold. By default, all objects are processed. By setting this threshold, the objects that do not take at least a given percentage of an image are simply ignored. This can be changed to a single object by probability or area in Select. As I mentioned, we extract the object before further processing. We can extend it a bit to include some context using Expand Bounding Box by…

Setting a threshold for a space that an object should occupy in order to be detected

A Typical Flows Application: Fashion Tagging

We have been playing with the fashion subject since the inception of Ximilar. It is the most challenging and also the most promising one. We have created all kinds of tools and helpers for the fashion industry, namely Fashion Tagging, specialized Fashion Search, or Annotate. We are proud to have a very precise automatic fashion tagging service with a rich fashion taxonomy.

And, of course, Fashion Tagging is internally powered by Flows. It is a huge project with several dozens of features to recognize, about a hundred recognition tasks, and hundreds of labels all chained into several interconnected flows. For example, this is what our AI says about a simple dress now – and you can try it on your picture in the public demo.

Example of fashion attributes assigned to a dress by Ximilar Fashion Tagging flow

Try the Fashion Tagging demo

Include Pre-trained Services In Your Flow

The last group of nodes at your disposal are Ximilar services. We are working hard and on an ever-growing number of ready-to-use services which can be called through our API and integrated into your project. It is natural for our users to combine more AI services, and flows make it easier than ever. At this moment, you can call these ready-to-use recognition services:

Fashion Tagging (demo, details)
Home Decor Tagging (demo, details)
Dominant Colors (demo, details)

But more will come in the future, for example, Remove Background.

Increasing Possibilities of Flows

As our app and list of services grow, so do the flows. There are two features we are currently looking forward to. We are already building custom similarity models for our customers. As soon as they are ready, they will be available for combining in flows. And there is one more item very high on our list, which is predicting numeric values. Regression, in machine learning terms. Stay tuned for more exciting news!

Create Your Flow – It’s Free

Before Flows, setting up the AI Vision process was a tedious task for a skilled developer. Now everyone can set up, manage and alter steps on their own. In a comprehensive, visual way. Being able to optimize the process quickly, getting a faster response, losing less time and expenses, and delivering higher quality to customers.

And what’s the best part? Flows are available to the users of Ximilar’s free plan, so you can try them right away. Register or sign up to the Ximilar App and enter Flows service at the Dashboard. If you want to learn the basics first, check out our video tutorials. Then you can connect tasks and labels defined in your own Image Recognition.

Training of machine learning models is free with Ximilar, you are only paying for API calls for recognition. Read more about API calls or API credit packs. We strongly believe you will love Flows as much as we enjoyed bringing them to life. And if you feel like there is a feature missing, or if you prefer a custom-made solution, feel free to contact us!

The post Flows – The Game Changer for Next-Generation AI Systems appeared first on Ximilar: Visual AI for Business.

How to Build Your Own Image Recognition API?

Víťa Válka — Fri, 16 Jul 2021 10:38:27 +0000

Image recognition systems are still young, but they become more available every day. Usually, custom image recognition APIs are used for better filtering and recommendations of products in e-shops, sorting stock photos, classification of errors, or pathological findings. Ximilar, same as Apple Vision SDK or Google Tensorflow, make the training of custom recognition models easy and affordable. However, not many people and companies have been using this technology to its full potential so far.

For example, recently, I had a conversation with a client who said that Google Vision didn’t work for him, and it returned non-relevant tags. The problem was not the API but the approach to it. He employed a few students to do the labelling job and create an image classifier. However, the results were not good at all. After showing him our approach, sharing some tips and simple rules, he got better classification results almost immediately. This post should serve as a comprehensive guide for those, who build their own image classifiers and want to get the most out of it.

How to Begin

Image recognition is based on the techniques of machine learning and computer vision. It is able to categorize and tag images with tags describing the attributes recognized in them. You can read everything about the service and its possibilities here.

To train your own Image Recognition models and create a system accessible through API, you will first need to upload a set of training images and create your image recognition tasks (models). Then you will use the training set to train the models to categorize the images.

If you need your images to be tagged, you should upload or create a set of tags and train tagging tasks. As the last step, you can combine these tasks into a Flow, and modify or replace any of them anytime due to its modular structure. You can then gradually improve your accuracy based on testing, evaluation metrics and feedback from your customers. Let’s have a look at the basic rules you should follow to reach the best results.

The Basic Rules for Image Recognition Models Training

Each image recognition task contains at least two labels (classes, categories) – e.g., cats and dogs. A classic image recognition model (task) assigns one label to each image – so the image is either a cat or dog. In general, the more classes you have, the more data you will need to teach the neural network to predict labels.

Binary classification for cats and dogs. Source: Kelly Lacy (Pexels), Pixabay

The training images should represent the real data that will be analyzed in a production setting. For example, if you aim to build a medical diagnostic tool helping radiologists identify the slightest changes in the lung tissue, you need to assemble a database of x-ray images with proven pathological findings. For the first training of your task, we recommend sticking to these simple rules:

Start with binary classification (two labels) – use 50–100 images/label
Use about 20 labels for basic and 100 labels for more complex solutions
For well-defined labels use 200+ images/label
For hard to recognize labels add 100+ images/label
Pattern recognition – for structures, x-ray images, etc. use 50–100 images/label

Always keep in mind, that training one task with hundreds of labels on small datasets almost never works. You need at least 20 labels and 100+ images per label to start with to achieve solid results. Start with the recommended counts, and then add more if needed.

You can create your image recognition model via app.ximilar.com without coding.

The Difference Between Testing & Production

The users of Ximilar App can train tasks with a minimum of 20 images per label. Our platform automatically divides your input data into two datasets – training & test set, usually in a ratio of 80:20. The training set is used to optimize the parameters of the classifier. During the training, the training images are augmented in several ways to extend the set.

The test data (about 20 %) are then used to validate and measure accuracy by simulating how the model will perform in production. You can see the accuracy results on the Task dashboard in Ximilar App. You can also create an independent test dataset and evaluate it. This is a great way to get accurate results on a dataset that was not seen by the model in the training before you actually deploy it.

Remember, the lower limit of 20 images per label usually leads to weak results and low accuracy. While it might be enough for your testing, it won’t be enough for production. This is also called overfitting. Most of the time the accuracy in Ximilar is pretty high, easily over 80 % for small datasets. However, it is common in machine learning to use more images for more stable and reliable results in production. Some tasks need hundreds or thousands of images per label for the good performance of your production model. Read more about the advanced options for training.

The Best Practices in Image Recognition Training

Start With Fewer Categories

I usually advise first-time users to start with up to 10 categories. For example, when building an app for people to recognize shoes, you would start with 10 shoe types (running, trekking, sneakers, indoor sport, boots, mules, loafers …). It is easier to train a model with 10 labels, each with 100 training images of a shoe type, than with 30 types. You can let users upload new shoe images. This way, you can get an amazing training dataset of real images in one month and then gradually update your model.

Use Multiple Recognition Tasks With Fewer Categories

The simpler classifiers can be incredibly helpful. Actually, we can end up with more than 30 types of shoes in one model. However, as we said, it is harder to train such a model. Instead, we can create a system with better performance if we create one model for classifying footwear into main types – Sport, Casual, Elegant, etc. And then for each of the main types, we create another classifier. So for Sport, there will be a model that classifies sports shoes to Running shoes, Sneakers, Indoor shoes, Trekking shoes, Soccer shoes, etc.

Use Binary Classifiers for Important Classes

Imagine you are building a tagging model for real estate websites, and you have a small training dataset. You can first separate your images into estate types. For example, start with a binary classifier that separates images to groups “Apartment” and “Outdoor house”. Then you can train more models specifically for room types (kitchen, bedroom, living room, …), apartment features, room quality, etc. These models will be used only if the image is labelled as “Apartment”.

Ximilar Flows allow you to connect multiple custom image recognition models to API.

You can connect all these tasks via the Flows system with a few clicks. This way, you can chain multiple image recognition models in one API endpoint and build a powerful visual AI. Typical use cases for Flows are in the e-commerce and healthcare fields. Systems for fashion product tagging can also contain thousands of labels. It’s hard to train just one model with thousands of labels that will have good accuracy. But, if you divide your data into multiple models, you will achieve better results in a shorter time! For labelling work, you can use our image Annotation system if needed.

Choose Your Training Images Wisely

Machine learning performs better if the distribution of training and evaluated pictures is even. It means that your training pictures should be very visually similar to the pictures your model will analyze in a production setting. So if your model will be used in CCTV setting, then your training data must come from CCTV cameras. Otherwise, you are likely to build a model that has great performance on training data, but it completely fails when used in production.

The same applies to Real Estate and other fields. If the system analyzes images of real estate that were not made only by professional photographers, then you need to include photos from smartphones, with bad lighting, blurry images, etc.

Typical home decor and real estate images used for image recognition. Your model should be able to recognize both professional and non-professional images. Source: Pexels.

Improving the Accuracy of the System

When clicking on the training button on the task page, the new model is created and put in the training queue. If you upload more data or change labels, you can train a new model. You can have multiple versions of them and deploy to the API only specific version that works best for you. Down on the task page, you can find a table with all your trained models (only the last 5 are stored). For each trained model, we store several metrics that are useful when deciding which model to pick for production.

Multiple versions models of your task in Ximilar Platform. Click on activate and this version will be deployed as API.

Inspect the Results and Errors

Click on the zoom icon in the list of trained models to inspect the results. You can see the basic metrics: Accuracy, Recall, and Precision. Precision tells you what is the probability that the model is right if it predicts a specific label. Recall tells you how likely is the prediction correct. If we have high recall but lower precision for the label “Apartment” from our real estate example, then the model is probably predicting on every image that it is “Apartment” (even on the images that should be “Outdoor house”). The solution is probably simple – just add more pictures that represent “Outdoor house”.

The Confusion matrix shows you which labels are easily confused by the trained model. These labels probably contain similar images, and it is therefore hard for the model to distinguish between them. Another useful component is Failed Images (misclassified) that show you the model’s mistake on your data. With Failed images, you can also see labelling mistakes in your data and fix them immediately. All of these features will help you build a more reliable model with good performance.

Inspecting the results of your trained models can show you potential problems in your data.

Reliability of the Image Recognition Results

Every client is looking for reliability and robustness. Stay simple if you aim to reach high accuracy. Build models with just a few labels if you can. For more complex tagging systems use Flows. Building an image classifier with a limited number of training images needs an iterative approach. Here are a few tips on how to achieve high accuracy and reliable results:

Break your large task into simple decisions (yes or no) or basic categories (red, blue and green)
Make fewer categories & connect them logically
Use general models for general categories
Make sure your training data represent the real data your model will analyze in production
Each label should have a similar amount of images, so the data will be balanced
Merge very close classes (visually similar), then create another task only for them, and connect it via Flows
Use both human and UI feedback to improve the quality of your dataset – inspect evaluation metrics like Accuracy, Precision, Recall, Confusion Matrix, and Failed Images
Always collect new images to extend your dataset

Summary for Training Image Recognition Models

Building an image classifier requires a proper task definition and continuous improvements of your training dataset. If the size of the dataset is challenging, start simple and gradually iterate towards your goal. To make the basic setup easier, we created a few step-by-step video tutorials. Learn how to deploy your models for offline use here, check the other guides, or our API documentation. You can also see for yourself how our pre-trained models perform in the public demo.

We believe that with the Ximilar platform, you are able to create highly complex, customizable, and scalable solutions tailored to the needs of your business – check the use cases for quality control, visual search engines or fashion. The basic features in our app are free, so anyone can try it. Training of image recognition models is also free with Ximilar platform. You are simply paying only for calling the model for prediction. We are always here to discuss your custom projects and all the challenges in person or on a call. If you have any questions, feel free to contact us.

Try our public demos

The post How to Build Your Own Image Recognition API? appeared first on Ximilar: Visual AI for Business.

Image Annotation Tool for Teams

Michal Lukáč — Thu, 06 May 2021 11:55:57 +0000

Through the years, we worked with many annotation tools. The problem is most of the desktop annotating apps are offline and intended for single-person use, not for team cooperation. The web-based apps, on the other hand, mostly focus on data management with photo annotation, and not on the whole ecosystem with API and inference systems. In this article, I review, what should a good image annotation tool do, and explain the basic features of our own tool – Annotate.

Every big machine learning project requires the active cooperation of multiple team members – engineers, researchers, annotators, product managers, or owners. For example, supervised deep learning for object detection, as well as segmentation, outperforms unsupervised solutions. However, it requires a lot of data with correct annotations. Annotation of images is one of the most time-consuming parts of every deep learning project. Therefore, picking the right annotator tool is critical. When your team is growing and your projects require higher complexity over time, you may encounter new challenges, such as:

Adding labels to the taxonomy would require re-checking a lot of your work
Increasing the performance of your models would require more data
You will need to monitor the progress of your projects

Building solid annotation software for computer vision is not an easy task. And yes, it requires a lot of failures and taking many wrong turns before finding the best solution. So let’s look at what should be the basic features of an advanced data annotation tool.

What Should an Advanced Image Annotation Tool Do?

Many customers are using our cloud platform Ximilar App in very specific areas, such as Fashion, Healthcare, Security, or Industry 4.0. The environment of a proper AI helper or tool should be complex enough to cover requirements like:

Features for team collaboration – you need to assign tasks, and then check the quality and consistency of data
Great user experience for dataset curation – everything should be as simple as possible, but no simpler
Fast production of high-quality datasets for your machine-learning models
Work with complex taxonomies & many models chained with Flows
Fast development and prototyping of new features
Connection to Rest API with Python SDK & querying annotated data

With these needs in mind, we created our own image annotation tool. We use it in our internal projects and provide it to our customers as well. Our technologies for machine learning accelerate the entire pipeline of building good datasets. Whether you are a freelancer tagging pictures or a team managing product collections in e-commerce, Annotate can help.

Our Visual AI tools enable you to work with your own custom taxonomy of objects, such as fashion apparel or things captured by the camera. You can read the basics on the categories & tags and machine learning model training, watch the tutorials, or check our demo and see for yourself how it works.

The Annotate

Annotate is an advanced image annotation tool, which enables you to annotate images precisely and fast. It works as an end-to-end platform for visual data management. You can query the same images, change labels, create objects, draw bounding boxes and even polygons here.

It is a web-based online annotation tool, that works fully on the cloud. Since it is connected to the same back-end & database as Ximilar App, all changes you do in Annotate, manifest in your workspace in App, and vice versa. You can create labels, tasks & models, or upload images through the App, and use them in Annotate.

Ximilar Application and Annotate are connected to the same backend (api.ximilar.com) and the same database.

Annotate extends the functionalities of the Ximilar App. The App is great for training, creating entities, uploading data, and batch management of images (bulk actions for labelling and filtering). Annotate, on the other hand, was created for the detail-oriented management of images. The default single-zoomed image view brings advantages, such as:

Identifying separate objects, drawing polygons and adding metadata to a single image
Suggestions based on AI image recognition help you choose from very complex taxonomies
The annotators focus on one image at a time to minimize the risk of mistakes

Interested in getting to know Annotate better? Let’s have a look at its basic functions.

Deep Focus on a Single Image

If you enter the Images (left menu), you can open any image in the single image view. To the right of the image, you can see all the items located in it. This is where most of the labelling is done. There is also a toolbar for drawing objects and polygons, labelling images, and inspecting metadata.

In addition, you can zoom in/out and drag the image. This is especially helpful when working with smaller objects or big-resolution images. For example, teams annotating medical microscope samples or satellite pictures can benefit from this robust tool.

The main view of the image in our Fashion Tagging workspace

Create Multiple Workspaces

Some of you already know this from other SaaS platforms. The idea is to divide your data into several independent storages. Imagine your company is working on multiple projects at the same time and each of them requires you to label your data with an image annotation tool. Your company account can have many workspaces, each for one project.

Here is our active workspace for Fashion Tagging

Within the workspaces, you don’t mix your images, labels, and tasks. For example, one workspace contains only images for fruit recognition projects (apples, oranges, and bananas) and another contains data on animals (cats and dogs).

Your team members can get access to different workspaces. Also, everyone can switch between the workspaces in the App as well as in Annotate (top right, next to the user icon). Did you know, that the workspaces are also accessible via API? Check out our documentation and learn how to connect to API.

See API Documentation

Train Precise AI Models with Verification

Building good computer vision models requires a lot of data, high-quality annotations, and a team of people who understand the process of building such a dataset. In short, to create high-quality models, you need to understand your data and have a perfectly annotated dataset. In the words of the Director of AI at Tesla, Andrej Karpathy:

Annotate helps you build high-quality AI training datasets by verification. Every image can be verified by different users in the workspace. You can increase the precision by training your models only on verified images.

A list of users who verified the image with the exact dates

Verifying your data is a necessary requirement for the creation of good deep-learning models. To verify the image, simply click the button verify or verify and next (if you are working on a job). You will be able to see who verified any particular image and when.

Create and Track Image Annotating Jobs

When you need to process the newly uploaded images, you can assign them to a Job and a team of people can process them one by one in a job queue. You can also set up exactly how many times each image should be seen by the people processing this queue.

Moreover, you can specify, which photo recognition model or flow of models should be displayed when doing the job. For example, here is the view of the jobs that we are using in one of our tagging services.

Two jobs are waiting to be completed by annotators,
you can start working by hitting the play button on the right

When working on a job, every time an annotator hits the Verify & Next button, it will redirect them to a new image within a job. You can track the progress of each job in the Jobs. Once the image annotation job is complete, the progress bar turns green, and you can proceed to the next steps: retraining the models, uploading new images, or creating another job.

Draw Objects and Polygons

Sometimes, recognizing the most probable category or tags for an image is not enough. That is why Annotate provides a possibility to identify the location of specific things by drawing objects and polygons. The great thing is that you are not paying any credits for drawing objects or labelling. This makes Annotate one of the most cost-effective online apps for image annotation.

Simply click and drag the rectangle with the rectangle tool on canvas to create the detection object.

What exactly do you pay for, when annotating data? The only API credits are counted for data uploads, with volume-based discounts. This makes Annotate an affordable, yet powerful tool for data annotation. If you want to know more, read our newest Article on API Credit Packs, check our Pricing Plans or Documentation.

Annotate With Complex Taxonomies Elegantly

The greatest advantage of Annotate is working with very complex taxonomies and attribute hierarchies. That is why it is usually used by companies in E-commerce, Fashion, Real Estate, Healthcare, and other areas with rich databases. For example, our Fashion tagging service contains more than 600 labels that belong to more than 100 custom image recognition models. The taxonomy tree for some of the biotech projects can be even broader.

Navigating through the taxonomy of labels is very elegant in Annotate – via Flows. Once your Flow is defined (our team can help you with it), you simply add labels to the images. The branches expand automatically when you add labels. In other words, you always see only essential labels for your images.

Simply navigate through your taxonomy tree, expanding branches when clicking on specific labels.

For example, in this image is a fashion object “Clothing”, to which we need to assign more labels. Adding the Clothing/Dresses label will expand the tags that are in the Length Dresses and Style Dresses tasks. If you select the label Elegant from Style Dresses, only features & attributes you need will be suggested for annotation.

Automate Repetitive Tasks With AI

Annotate was initially designed to speed up the work when building computer vision solutions. When annotating data, manual drawing & clicking is a time-consuming process. That is why we created the AI helper tools to automate the entire annotating process in just a few clicks. Here are a few things that you can do to speed up the entire annotation pipeline:

Use the API to upload your previously annotated data to train or re-train your machine learning models and use them to annotate or label more data via API
Create bounding boxes and polygons for object detection & instance object segmentation with one click
Create jobs, share the data, and distribute the tasks to your team members

Predicting bounding boxes with one click automates the entire process of annotation.

Image Annotation Tool for Advanced Visual AI Training

As the main focus of Ximilar is AI for sorting, comparing, and searching multimedia, we integrate the annotation of images into the building of AI search models. This is something that we miss in all other data annotation applications. For the building of such models, you need to group multiple items (images or objects, typically product pictures) into the Similarity Groups. Annotate helps us create datasets for building strong image similarity search models.

Grouping the same or similar images with the Image Annotation Tool. You can tell which item is a smartphone photo or which photos should be located on an e-commerce platform.

Annotate is Always Growing

Annotate was originally developed as our internal image annotation software, and we have already delivered a lot of successful solutions to our clients with it. It is a unique product that any team can benefit from and improve the computer vision models unbelievably fast.

We plan to introduce more data formats like videos, satellite imagery (sentinel maps), 3D models, and more in the future to level up the Visual AI in fields such as visual quality control or AI-assisted healthcare. We are also constantly working on adding new features and improving the overall experience of Ximilar services.

Annotate is available for all users with Business & Professional pricing plans. Would you like to discuss your custom solution or ask anything? Let’s talk! Or read how the cooperation with us works first.

How do custom projects work?

The post Image Annotation Tool for Teams appeared first on Ximilar: Visual AI for Business.