Ximilar App - Ximilar: Visual AI for Business

We Introduce Plan Overview & Advanced Plan Setup

Zuzana Raidová — Tue, 24 Sep 2024 13:58:05 +0000

We’re excited to introduce new updates to Ximilar App! As a machine learning platform for training and deploying computer vision models, it also lets you manage subscriptions, monitor API credit usage, and purchase credit packs.

These updates aim to improve your experience and streamline plan setup and credit consumption optimization. Here’s a quick rundown of what’s new.

Plan Setup: Simplified Subscription Management

We’ve revamped the subscription page with new features and better functionality. The Plan Setup page now allows you to choose between Free, Business, or Professional plans, customize your monthly credit supply using a slider, and access our new API Credit Consumption Calculator—a handy tool to help you make informed decisions.

Plan setup in Ximilar App.

The entire checkout process has been streamlined as well, allowing you to adjust your payment method directly before completing your purchase.

Go to Plan setup

Explore Pricing plans

Manage Your Payment Methods and Currencies

You can change the default currency for plan setup and payments in the Settings. To update your payment method, simply access the Stripe Portal from your Plan Overview under “More Actions.” If you prefer a different payment method or have any additional questions, feel free to reach out to us!

Credit Calculator: Estimate & Optimise Your Credit Consumption

One of the most exciting additions to the app is the new Credit Calculator, now available directly within the platform. While this tool was previously featured on our Pricing page, it’s now integrated into the app as well, allowing you to not only estimate your credit needs but also preset your subscription plan directly from the calculator.

Once you’ve adjusted your credits based on projected usage, you can proceed straight to checkout, making the entire process of optimizing and purchasing credits smoother and more efficient.

Credit consumption calculator in Ximilar App.

Calculator in App

Calculator at Pricing page

Plan Overview: A Complete View of Your Plans and Credits

The page Plan Overview gives you a comprehensive view of your active subscription, any past plans, and your pre-paid credit packs. Previously, credit information was limited to your dashboard, but now you have detailed insight into your credit usage and plan history.

Plan overview in Ximilar App.

In the Plan Overview, you can view all your current active subscription plans. If you upgrade or downgrade, multiple plans may temporarily appear, as credits from your previous plan remain available until the end of the billing period.

Go to Plan overview

Reports: Detailed Insights into Credit Usage

Our new Reports page enables you to gain deeper insights into your API credit usage. It provides two types of reports: credit consumption by AI solution (e.g., Card Grading) and by individual operation within a solution (e.g., “grade one card” within the Card Grading solution).

Reports in Ximilar App give you detailed insight into your API credit consumption.

See Reports

Credit Packs: Flexibility to Buy Extra Credits Anytime

API Credit packs act as a safety net for unexpected system loads. Now available on their dedicated page, you can purchase additional API credit packs as needed. You can also compare pricing against higher subscription plans and choose the most cost-effective option. Both your active and used credit packs will be displayed on the Plan Overview page.

API Credit packs page in Ximilar App.

Go to Credit packs

Invoices: All Your Purchases in One Place

This updated page neatly lists all your invoices, including both subscription payments and one-time credit pack purchases, ensuring that all your financial information is in one place.

Invoices in Ximilar App.

Go to Invoices

Greater Control & Flexibility For the Users

These updates are designed to provide you with greater control, transparency, and flexibility as you build and deploy visual AI solutions. All of these features are now accessible in your sidebar. Check them out, and feel free to reach out with any questions!

The post We Introduce Plan Overview & Advanced Plan Setup appeared first on Ximilar: Visual AI for Business.

New AI Solutions for Card & Comic Book Collectors

Zuzana Raidová — Wed, 18 Sep 2024 12:35:34 +0000

Recognize and Identify Comic Books in Detail With AI

The newest addition to our portfolio of solutions is the Comics Identification (/v2/comics_id). This service is designed to identify comics from images. While it’s still in the early stages, we are actively refining and enhancing its capabilities.

The API detects the largest comic book in an image, and provides key information such as the title, issue number, release date, publisher, origin date, and creator’s name, making it ideal for identifying comic books, magazines, as well as manga.

Comics Identification by Ximilar provides the title, issue number, release date, publisher, origin date, and creator’s name.

This tool is perfect for organizing and cataloging large comic collections, offering accurate identification and automation of metadata extraction. Whether you’re managing a digital archive or cataloging physical collections, the Comics Identification API streamlines the process by quickly delivering essential details. We’re committed to continuously improving this service to meet the evolving needs of comic identification.

Try how it works

Learn more

Star Wars Unlimited, Digimon, Dragon Ball, and More Can Now Be Recognized by Our System

Our trading card identification system has already been widely used to accurately recognize and provide detailed information on cards from games like Pokémon, Yu-Gi-Oh!, Magic: The Gathering, One Piece, Flesh and Blood, MetaZoo, and Lorcana.

Recently, we’ve expanded the system to include cards from Garbage Pail Kids, Star Wars Unlimited, Digimon, Dragon Ball Super, Weiss Schwarz, and Union Arena. And we’re continually adding new games based on demand. For the full and up-to-date list of recognized games, check out our API documentation.

Ximilar keeps adding new games to the trading card game recognition system. It can easily be deployed via API and controlled in our App.

Try how it works

See the full taxonomy

Detect and Identify Both Trading Cards and Their Slab Labels

The new endpoint slab_grade processes your list of image records to detect and identify cards and slab labels. It utilizes advanced image recognition to return detailed results, including the location of detected items and analyzed features.

Graded slab reading by Ximilar AI.

The Slab Label object provides essential information, such as the company or category (e.g., BECKETT, CGC, PSA, SGC, MANA, ACE, TAG, Other), the card’s grade, and the side of the slab. This endpoint enhances our capability to categorize and assess trading cards with greater precision. In our App, you will find it under Collectibles Recognition: Slab Reading & Identification.

Try how it works

Documentation

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

New Endpoint for Card Centering Analysis With Interactive Demo

Given a single image record, the centering endpoint returns the position of a card and performs centering analysis. You can also get a visualization of grading through the _clean_url_card and _exact_url_card fields.

The _tags field indicates if the card is autographed, its side, and type. Centering information is included in the card field of the record.

The card centering API by Ximilar returns the position of a card and performs centering analysis.

Try how it works

Documentation

Learn How to Scan and Identify Trading Card Games in Bulk With Ximilar

Our new guide How To Scan And Identify Your Trading Cards With Ximilar AI explains how to use AI to streamline card processing with card scanners. It covers everything from setting up your scanner and running a Python script to analyzing results and integrating them into your website.

Read the guide

Let Us Know What You Think!

And that’s a wrap on our latest updates to the platform! We hope these new features might help your shop, website, or app grow traffic and gain an edge over the competition.

If you have any questions, feedback, or ideas on how you’d like to see the services evolve, we’d love to hear from you. We’re always open to suggestions because your input shapes the future of our platform. Your voice matters!

The post New AI Solutions for Card & Comic Book Collectors appeared first on Ximilar: Visual AI for Business.

How to Identify Sports Cards With AI

Michal Lukáč — Mon, 12 Feb 2024 11:47:38 +0000

We have huge news for the collectors and collectibles marketplaces. Today, we are releasing an AI-powered system able to identify sports cards. It was a massive amount of work for our team, and we believe that our sports card identification API can benefit a lot of local shops, small and large businesses, as well as individual developers who aim to build card recognition apps.

Sports Cards Collecting on The Rise

Collecting sports cards, including hockey cards, has been a popular hobby for many people. Especially during my childhood, I collected hockey cards, as a big fan of the sport. Today, card collecting has evolved into an investment, and many new collectors enter the community solely to buy and sell cards on various marketplaces.

Some traditional baseball rookie cards can have significant value, for example, the estimated price of a vintage Mickey Mantle PSA 10 1952 Topps rookie baseball card is $15 million – $30 million.

Our Existing Solutions for Card Collector Sites & Apps

Last year, we already released several services focused on trading cards:

First, we released a Trading Card Game Identifier API. It can identify trading card games (TCGs), such as Pokémon, Magic The Gathering: MTG and Yu-Gi-Oh!, and more. We believe that this system is amongst the fastest, most precise and accurate in the world.
Second, we built a Card Grading and fast Card Conditioning API for both sports and trading card games. This service can instantly evaluate each corner, edges, and surface, and check the centring in a card scan, screenshot or photo in a matter of seconds. Each of these features is graded independently, resulting in an overall grade. The outputs can be both values or conditions-based (eBay or TCGPlayer naming). You can test it here.
We have also been building custom visual search engines for private collections of trading cards and other collectibles. With this feature, people can visit marketplaces or use their apps to upload card images, and effortlessly search for identical or similar items in their database with a click. Visual search is a standard AI-powered function in major price comparators. If a particular game is not on our list, or if you wish to search within your own collection, list, or portfolio of other collectibles (e.g., coins, stamps, or comic books), we can also create it for you – let us know.

We have been gradually establishing a track record of successful projects in the collectibles field. From the feedback of our customers, we hear that our services are much more precise than the competition. So a couple of months ago, we started building a sports card scanning system as well. It allows users to send the scan to the API, and get back precise identification of the card.

Our API is open to all developers, just sign up to Ximilar App, and you can start building your own great product on top of it!

Test it Now in Live Demo

This solution is already available for testing in our public demo. Try it for free now!

The Main Features of Sports Cards

There are several factors determining the value of the card:

Rarity & Scarcity: Cards with limited production runs or those featuring star players are often worth more.
Condition: Like any collectible item, the condition of a sports card is crucial. Cards in mint or near-mint condition are generally worth more than those with wear and tear.
Grade & Grading services: Graded cards (from PSA or Beckett) typically have higher prices in the market.
The fame of the player: Names of legends like Michael Jordan or Shohei Ohtani instantly add value to the trading cards in your collection.
Autographs, memorabilia, and other features, that add to the card’s rarity.

Each card manufacturer must have legal rights and licensing agreements with the sports league, teams, or athletes. Right now, there are several main producers:

Panini – This Italian company is the largest player in the market in terms of licensing agreements and number of releases.
Topps – Topps is an American company with a long history. They are now releasing cards from Baseball, Basketball or MMA.
Upper Deck – Upper Deck is a company with an exclusive license for hockey cards from the NHL.
Futera – Futera focuses mostly on soccer cards.

Example of Upper Deck, Futera, Panini Prizm and Topps Chrome cards.

Dozens of other card manufacturers were acquired by these few players. They add their brands or names as special sets in their releases. For example, the Fleer company was acquired by Upper Deck in 2005 and Donruss was bought by Panini.

Identifying Sports Cards With Artificial Intelligence

When it comes to sports cards, it’s crucial to recognize that the identification challenge is more complex than that of Pokémon or Magic The Gathering cards. While these games present challenges such as identical trading card artworks in multiple sets or different language variants, sports cards pose distinct difficulties in recognition and identification, such as:

Amount of data/cards – The companies add a lot of new cards into their portfolio each year. As of the latest date, the total figure exceeds tens of millions of cards.
Parallels, variations, and colours – The card can have multiple variants with different colours, borders, various foil effects, patterns, or even materials. More can be read in a great article by getcardbase.com. Look at the following example of the NBA’s LeBron James card, and some of its variants.

LeBron James 2021 Donruss Optic #41 card in several variations of different parallels and colors.

Special cards: Short Print (SP) and Super Short Print (SSP) cards are intentionally produced in smaller quantities than the rest of the particular set. The most common special cards are Rookie cards (RC) that feature a player in their rookie season and that is why they hold sentimental and historical value.
Serial numbered cards: A type of trading cards that have a unique serial number printed directly on the card itself.
Authentic signature/autograph: These are usually official signature cards, signed by players. To examine the authenticity of the signature, and thus ensure the card’s value, reputable trading card companies may employ card authentication processes.
Memorabilia: In the context of trading cards, memorabilia cards are special cards that feature a piece of an athlete’s equipment, such as a patch from a uniform, shoe, or bat. Sports memorabilia are typically more valuable because of their rarity. These cards are also called relic cards.

As you can see, it’s not easy to identify the card and its price and to keep track of all its different variants.

Example: Panini Prizm Football Cards

Take for example the 2022 Panini Prizm Football Cards and the parallel cards. Gold Prizms (10 cards) are worth much more than the Orange Prizms (with 250 cards) because of their scarcity. Upon the release of a card set, the accompanying checklist, presented as a population table, is typically made available. This provides detailed information about the count for each variation.

2022 Panini Prizm Football Cards examples. (Source: beckett.com)

Next, for Panini Prizm, there are more than 20 parallel foil patterns like Speckle, Hyper, Diamond, Fast Break/Disco/No Huddle, Flash, Mozaic, Mojo, Pulsar, Shimmer, etc. with all possible combinations of colours such as green, blue, pink, purple, gold, and so on.

These combinations matter because some of them are more rare than others. There are also different names for the foil cards between companies. Topps has chrome Speckle patterns which are almost identical to the Panini Prizm Sparkle pattern.

Lastly, no database contains each picture for every card in the world. This makes visual search extremely hard for cards that have no picture on the internet.

If you feel lost in all the variations and parallels cards, you are not alone.

Luckily, we developed (and are actively improving) an AI service that is trying to tackle the mentioned problems with sports cards identification. This service is available on click as an open REST API, so anyone can connect to develop and integrate their system with ours. The results are in seconds and it’s one of the fastest services available in the market.

How to Identify Sports Cards Via API?

In general, you can use and connect to the REST API with any programming language like Python or Javascript. Our developer’s documentation will serve you as a guide with many helpful instructions and tips.

To access our API, sign in Ximilar App to get your unique API authentication token. You will find the administration of your services under Collectibles Recognition. Here is an example REST Request via curl:

$ curl https://api.ximilar.com/collectibles/v2/sport_id -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
    "records": [
        { "_url": "__PATH_TO_IMAGE_URL__"}
    ], "slab_id": false
}'

The example response when you identify sports cards with Ximilar API.

The API response will be as follows:

When the system succesfuly indetifies the card, it will return you full identification. You will get a list of features such as the name of the player/person, the name of the set, card number, company, team and features like foil, autograph, colour and more. It is also able to generate URL links for eBay searches so you can check the card values or purchase them directly.
If we are not sure about the identification (or we don’t have a specific card in our system) the system will return empty search results. In such case, feel free to ask for support.

How AI Sports Cards Identification Works?

Our identification system uses advanced machine learning models with smart algorithms for post-processing. The system is a complex flow of models that incorporates visual search. We trained the system on a large amount of data, curated by our own annotation team.

First, we identify the location of the card in your photo. Second, we do multiple AI analyses of the card to identify whether it has autograph and more. The third step is to find the card in our collection with visual search (reverse image search). Lastly, we use AI to rerank the results to make them as precise as possible.

What Sports Cards Can Ximilar Identify?

Our sports cards database contains a few million cards. Of course, this is just a small subset of all collectible cards that were produced. Right now we focus on 6 main domains: Baseball cards, Football cards, Basketball cards, Hockey cards, Soccer and MMA, and the list expands based on demand. We continually add more data and improve the system.

We try to track and include new releases every month. If you see that we are missing some cards and you have the collection, let us know. We can agree on adding them to training data and giving you a discount on API requests. Since we want to build the most accurate system for card identification in the world, we are always looking for ways to gather more cards and improve the software’s accuracy.

Who Will Benefit From AI-Powered Sports Cards Identifier?

Access to our REST API can improve your position in the market especially if:

You own e-commerce sites/marketplaces that buy & sell cards – If you have your own shop, site or market for people who collect cards, this solution can boost your traffic and sales.
You are planning to design and publish your own collector app and need an all-in-one API for the recognition and grading of cards.
You want to manage, organize and add data to your own card collection.

Is My Data Safe?

Yes. First of all, we don’t save the analysed images. We don’t even have so much storage capacity to store each analysed image, photo, scan and screen you add to your collection. Once our system processes an image, it removes it from the memory. Also, GDPR applies to all photos that enter our system. Read more in our FAQs.

How Fast is the System, Can I Connect it to a Scanner?

The system can identify one card scan in one second. You can connect it to any card scanner available in the market. The scanning outputs the cards into the folders, to which you can apply a script for card identification.

Sports Cards Recognition Apps You Can Build With Our API

Here are a few ideas for apps that you can build with our Sport Card Identifier and REST API:

Automatic card scanning system – create a simple script that will be connected to our API and your scanners like Fujitsu fi-8170. The system will be able to document your cards with incredible speed. Several of our customers are already organizing their collections of TCGs (like Magic The Gathering or Pokémon) and adding new cards on the go.
Price checking app or portfolio analysis – create your phone app alternative to Ludex or CollX. Start documenting the cards by taking pictures and grading your trading card collection. Our system can provide card IDs, pre-grade cards, and search them in an online marketplace. Easily connect with other collectors, purchase & sell the cards. Test our system’s ability to provide URLs to marketplaces here.
Analysing eBay submission – would you like to know what your card’s worth and how many are currently available in the market? For how much was the card sold in the past? Track the price of the card over time? Or what is the card population? With our technology, you can build a system that can analyse it.

AI for Trading Cards and Collectors

So this is our latest narrow AI service for the collector community. It is quite easy to integrate it into any system. You can use it for automatic documentation of your collection or simply to list your cards on online markets.

For more information, contact us via chat or contact page, and we can schedule a call with you and talk about the technical and business details. If you want to go straight and implement it, take look at our developer’s API documentation and don’t hesitate to ask for guidance anytime.

Right now we are also working on Comics identification (Comic book, magazines and manga). If you would like to hear more then just contact us via email or chat.

Try our public demos

The post How to Identify Sports Cards With AI appeared first on Ximilar: Visual AI for Business.

When OCR Meets ChatGPT AI in One API

Michal Lukáč — Wed, 14 Jun 2023 09:38:27 +0000

Imagine a world where machines not only have the ability to read text but also comprehend its meaning, just as effortlessly as we humans do. Over the past two years, we have witnessed extraordinary advancements in these areas, driven by two remarkable technologies: optical character recognition (OCR) and ChatGPT (generative pre-trained transformer). The combined potential of these technologies is enormous and offers assistance in numerous fields.

That is why we in Ximilar have recently developed an OCR system, integrated it with ChatGPT and made it available via API. It is one of the first publicly available services combining OCR software and the GPT model, supporting several alphabets and languages. In this article, I will provide an overview of what OCR and ChatGPT are, how they work, and – more importantly – how anyone can benefit from their combination.

What is Optical Character Recognition (OCR)?

OCR (Optical Character Recognition) is a technology that can quickly scan documents or images and extract text data from them. OCR engines are powered by artificial intelligence & machine learning. They use object detection, pattern recognition and feature extraction.

An OCR software can actually read not only printed but also handwritten text in an image or a document and provide you with extracted text information in a file format of your choosing.

How Optical Character Recognition Works?

When an OCR engine is provided with an image, it first detects the position of the text. Then, it uses AI model for reading individual characters to find out what the text in the scanned document says (text recognition).

This way, OCR tools can provide accurate information from virtually any kind of image file or document type. To name a few examples: PDF files containing camera images, scanned documents (e.g., legal documents), old printed documents such as historical newspapers, or even license plates.

A few examples of OCR: transcribing books to electronic form, reading invoices, passports, IDs, and landmarks.

Most OCR tools are optimized for specific languages and alphabets. We can tune these tools in many ways. For example, to automate the reading of invoices, receipts, or contracts. They can also specialize in handwritten or printed paper documents.

The basic outputs from OCR tools are usually the extracted texts and their locations in the image. The data extracted with these tools can then serve various purposes, depending on your needs. From uploading the extracted text to simple Word documents to turning the recognized text to speech format for visually impaired users.

OCR programs can also do a layout analysis for transforming text into a table. Or they can integrate natural language processing (NLP) for further text analysis and extraction of named entities (NER). For example, identifying numbers, famous people or locations in the text, like ‘Albert Einstein’ or ‘Eiffel Tower’.

Technologies Related to OCR

You can also meet the term optical word recognition (OWR). This technology is not as widely used as the optical character recognition software. It involves the recognition and extraction of individual words or groups of words from an image.

There is also optical mark recognition (OMR). This technology can detect and interpret marks made on paper or other media. It can work together with OCR technology, for instance, to process and grade tests or surveys.

And last but not least, there is intelligent character recognition (ICR). It is a specific OCR optimised for the extraction of handwritten text from an image. All these advanced methods share some underlying principles.

What are GPT and ChatGPT?

Generative pre-trained transformer (GPT), is an AI text model that is able to generate textual outputs based on input (prompt). GPT models are large language models (LLMs) powered by deep learning and relying on neural networks. They are incredibly powerful tools and can do content creation (e.g., writing paragraphs of blog posts), proofreading and error fixing, explaining concepts & ideas, and much more.

The Impact of ChatGPT

ChatGPT introduced by OpenAI and Microsoft is an extension of the GPT model, which is further optimized for conversations. It has had a great impact on how we search, work with and process data.

GPT models are trained on huge amounts of textual data. So they have better knowledge than an average human being about many topics. In my case, ChatGPT has definitely better English writing & grammar skills than me. Here’s an example of ChatGPT explaining quantum computing:

ChatGPT model explaining quantum computing. [source: OpenAI]

It is no overstatement to say that the introduction of ChatGPT revolutionized data processing, analysis, search, and retrieval.

How Can OCR & GPT Be Combined For Smart Text Extraction

The combination of OCR with GPT models enables us to use this technology to its full potential. GPT can understand, analyze and edit textual inputs. That is why it is ideal for post-processing of the raw text data extracted from images with OCR technology. You can give the text to the GPT and ask simple questions such as “What are the items on the invoice and what is the invoice price?” and get an answer with the exact structure you need.

This was a very hard problem just a year ago, and a lot of companies were trying to build intelligent document-reading systems, investing millions of dollars in them. The large language models are really game changers and major time savers. It is great that they can be combined with other tools such as OCR and integrated into visual AI systems.

It can help us with many things, including extraction of essential information from images and putting them into text documents or JSON. And in the future, it can revolutionize search engines, and streamline automated text translation or entire workflows of document processing and archiving.

Examples of OCR Software & ChatGPT Working Together

So, now that we can combine computer vision and advanced natural language processing, let’s take a look at how we can use this technology to our advantage.

Reading, Processing and Mining Invoices From PDFs

One of the typical examples of OCR software is reading the data from invoices, receipts, or contracts from image-only PDFs (or other documents). Imagine a part of invoices and receipts your accounting department accepts are physical printed documents. You could scan the document, and instead of opening it in Adobe Acrobat and doing manual data entry (which is still a standard procedure in many accounting departments today), you would let the automated OCR system handle the rest.

Scanned documents can be automatically sent to the API from both computers and mobile phones. The visual AI needs only a few hundred milliseconds to process an image. Then you will get textual data with the desired structure in JSON or another format. You can easily integrate such technology into accounting systems and internal infrastructures to streamline invoice processing, payments or SKU numbers monitoring.

Receipt analysis via Ximilar OCR and OpenAI ChatGPT.

Trading Card Identifying & Reading Powered by AI

In recent years, the collector community for trading cards has grown significantly. This has been accompanied by the emergence of specialized collector websites, comparison platforms, and community forums. And with the increasing number of both cards and their collectors, there has been a parallel demand for automating the recognition and cataloguing collectibles from images.

Ximilar has been developing AI-powered solutions for some of the biggest collector websites on the market. And adding an OCR system was an ideal solution for data extraction from both cards and their graded slabs.

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

We developed an OCR system that extracts all text characters from both the card and its slab in the image. Then GPT processes these texts and provides structured information. For instance, the name of the player, the card, its grade and name of grading company, or labels from PSA.

Extracting text from the trading card via OCR and then using GPT prompt to get relevant information.

Needless to say, we are pretty big fans of collectible cards ourselves. So we’ve been enjoying working on AI not only for sports cards but also for trading card games. We recently developed several solutions tuned specifically for the most popular trading card games such as Pokémon, Magic the Gathering or YuGiOh! and have been adding new features and games constantly. Do you like the idea of trading card recognition automation? See how it works in our public demo.

Try demo

How Can I Use the OCR & GPT API On My Images or PDFs?

Our OCR software is publicly available via an online REST API. This is how you can use it:

Log into Ximilar App
- Get your free API TOKEN to connect to API – Once you sign up to Ximilar App, you will get a free API token, which allows your authentication. The API documentation is here to help you with the basic setup. You can connect it with any programming language and any platform like iOS or Android. We provide a simple Python SDK for calling the API.
- You can also try the service directly in the App under Computer Vision Platform.
For simple text extraction from your image, call the endpoint read.
```
https://api.ximilar.com/ocr/v2/read
```
For text extraction from an image and its post-processing with GPT, use the endpoint read_gpt. To get the results in a deserved structure, you will need to specify the prompt query along with your input images in the API request, and the system will return the results immediately.
```
https://api.ximilar.com/ocr/v2/read_gpt
```
The output is JSON with an ‘_ocr’ field. This dictionary contains texts that represent a list of polygons that encapsulate detected words and sentences in images. The full_text field contains all strings concatenated together. The API is returning also the language name (“lang_name”) and language code (“lang”; ISO 639-1). Here is an example:
```
{
  "_url": "__URL_PATH_TO_IMAGE__
  "_ocr": {
     "texts": [
       {
          "polygon": [[53.0,76.0],[116.0,76.0],[116.0,94.0],[53.0,94.0]],
          "text": "MICKEY MANTLE",
          "prob": 0.9978849291801453
       },
       ...
     ],
     "full_text": "MICKEY MANTLE 1st Base Yankees",
     "lang_name": "english",
     "lang_code": "en
  }
}
```
Our OCR engine supports several alphabets (Latin, Chinese, Korean, Japanese and Cyrillic) and languages (English, German, Chinese, …).

Integrate the Combination of OCR and ChatGPT In Your System

All our solutions, including the combination of OCR & GPT, are available via API. Therefore, they can be easily integrated into your system, website, app, or infrastructure.

Here are some examples of up-to-date solutions that can easily be built on our platform and automate your workflows:

Detection, recognition & text extraction system – You can let the users of your website or app upload images of collectibles and get relevant information about them immediately. Once they take an image of the item, our system detects its position (and can mark it with a bounding box). Then, it recognizes their features (e.g., name of the card, collectible coin or comic book), extracts texts with OCR and you will get text data for your website (e.g., in a table format).
Card grade reading system – If your users upload images of graded cards or other collectibles, our system can detect everything including the grades and labels on the slabs in a matter of milliseconds.
Comic book recognition & search engine – You can extract all texts from each image of a comic book and automatically match it to your database for cataloguing.
Giving your collection or database of collectibles order – Imagine you have a website featuring a rich collection of collectible items, getting images from various sources and comparing their prices. The metadata can be quite inconsistent amongst source websites, or be absent in the case of user-generated content. AI can recognize, match, find and extract information from images based purely on computer vision and independent of any kind of metadata.

Let’s Build Your Solution

If you would like to learn more about how you can automate the workflows in your company, I recommend browsing our page All Solutions, where we briefly explained each solution. You can also check out pages such as Visual AI for Collectibles, or contact us right away to discuss your unique use case. If you’d like to learn more about how we work on customer projects step by step, go to How it Works.

Ximilar’s computer vision platform enables you to develop AI-powered systems for image recognition, visual quality control, and more without knowledge of coding or machine learning. You can combine them as you wish and upgrade any of them anytime.

Don’t forget to visit the free public demo to see how the basic services work. Your custom solution can be assembled from many individual services. This modular structure enables us to upgrade or change any piece anytime, while you save your money and time.

How do custom projects work?

The post When OCR Meets ChatGPT AI in One API appeared first on Ximilar: Visual AI for Business.

Predict Values From Images With Image Regression

Zuzana Raidová — Wed, 22 Mar 2023 15:03:45 +0000

We are excited to introduce the latest addition to Ximilar’s Computer Vision Platform. Our platform is a great tool for building image classification systems, and now it also includes image regression models. They enable you to extract values from images with accuracy and efficiency and save your labor costs.

Let’s take a look at what image regression is and how it works, including examples of the most common applications. More importantly, I will tell you how you can train your own regression system on a no-code computer vision platform. As more and more customers seek to extract information from pictures, this new feature is sure to provide Ximilar’s customers with the tools they need to stay ahead of the curve in today’s highly competitive AI-driven market.

What is the Difference Between Image Categorization and Regression?

Image recognition models are ideal for the recognition of images or objects in them, their categorization and tagging (labelling). Let’s say you want to recognize different types of car tyres or their patterns. In this case, categorization and tagging models would be suitable for assigning discrete features to images. However, if you want to predict any continuous value from a certain range, such as the level of tyre wear, image regression is the preferred approach.

Image regression is an advanced machine-learning technique that can predict continuous values within a specific range. Whenever you need to rate or evaluate a collection of images, an image regression system can be incredibly useful.

For instance, you can define a range of values, such as 0 to 5, where 0 is the worst and 5 is the best, and train an image regression task to predict the appropriate rating for given products. Such predictive systems are ideal for assigning values to several specific features within images. In this case, the system would provide you with highly accurate insights into the wear and tear of a particular tyre.

Predicting the level of tires worn out from the image is a use case for an image regression task, while a categorization task can recognize the pattern of the tyre.

How to Train Image Regression With a Computer Vision Platform?

Simply log in to Ximilar App and go to Categorization & Tagging. Upload your training pictures and under Tasks, click on Create a new task and create a Regression task.

Creating an image regression task in Ximilar App.

You can train regression tasks and test them via the same front end or with API. You can develop an AI prediction task for your photos with just a few clicks, without any coding or any knowledge of machine learning.

This way, you can create an automatic grading system able to analyze an image and provide a numerical output in the defined range.

Use the Same Training Data For All Your Image Classification Tasks

Both image recognition and image regression methods fall under the image classification techniques. That is why the whole process of working with regression is very similar to categorization & tagging models.

Working with image regression model on Ximilar computer vision platform.

Both technologies can work with the same datasets (training images), and inputs of various image sizes and types. In both cases, you can simply upload your data set to the platform, and after creating a task, label the pictures with appropriate continuous values, and then click on the Train button.

Apart from a machine learning platform, we offer a number of AI solutions that are field-tested and ready to use. Check out our public demos to see them in action.

If you would like to build your first image classification system on a no-code machine learning platform, I recommend checking out the article How to Build Your Own Image Recognition API. We defined the basic terms in the article How to Train Custom Image Classifier in 5 Minutes. We also made a basic video tutorial:

Tutorial: train your own image recognition model with Ximilar platform.

Neural Network: The Technology Behind Predicting Range Values on Images

The most simple technique for predicting float values is linear regression. This can be further extended to polynomial regression. These two statistical techniques are working great on tabular input data. However, when it comes to predicting numbers from images, a more advanced approach is required. That’s where neural networks come in. Mathematically said, neural network “f” can be trained to predict value “y” on picture “x”, or “y = f(x)”.

Neural networks can be thought of as approximations of functions that we aim to identify through the optimization on training data. The most commonly used NNs for image-based predictions are Convolutional Neural Networks (CNNs), visual transformers (VisT), or a combination of both. These powerful tools analyze pictures pixel by pixel, and learn relevant features and patterns that are essential for solving the problem at hand.

CNNs are particularly effective in picture analysis tasks. They are able to detect features at different spatial scales and orientations. Meanwhile, VisTs have been gaining popularity due to their ability to learn visual features without being constrained by spatial invariance. When used together, these techniques can provide a comprehensive approach to image-based predictions. We can use them to extract the most relevant information from images.

What Are the Most Common Applications of Value Regression From Images?

Estimating Age From Photos

Probably the most widely known use case of image regression by the public is age prediction. You can come across them on social media platforms and mobile apps, such as Facebook, Instagram, Snapchat, or Face App. They apply deep learning algorithms to predict a user’s age based on their facial features and other details.

While image recognition provides information on the object or person in the image, the regression system tells us a specific value – in this case, the person’s age.

Needless to say, these plugins are not always correct and can sometimes produce biased results. Despite this limitation, various image regression models are gaining popularity on various social sites and in apps.

Ximilar already provides a face-detection solution. Models such as age prediction can be easily trained and deployed on our platform and integrated into your system.

Value Prediction and Rating of Real Estate Photos

Pictures play an essential part on real estate sites. When people are looking for a new home or investment, they are navigating through the feed mainly by visual features. With image regression, you are able to predict the state, quality, price, and overall rating of real estate from photos. This can help with both searching and evaluating real estate.

Predicting rating, and price (regression) for household images with image regression.

Custom recognition models are also great for the recognition & categorization of the features present in real estate photos. For example, you can determine whether a room is furnished, what type of room it is, and categorize the windows and floors based on their design.

Additionally, a regression can determine the quality or state of floors or walls, as well as rank the overall visual aesthetics of households. You can store all of this information in your database. Your users can then use such data to search for real estate that meets specific criteria.

Image classification systems such as image recognition and value regression are ideal for real estate ranking. Your visitors can search the database with the extracted data.

Determining the Degree of Wear and Tear With AI

Visual AI is increasingly being used to estimate the condition of products in photos. While recognition systems can detect individual tears and surface defects, regression systems can estimate the overall degree of wear and tear of things.

A good example of an industry that has seen significant adoption of such technology is the insurance industry. For example, startups-like Lemonade Inc, or Root use AI when paying the insurance.

With custom image recognition and regression methods, it is now possible to automate the process of insurance claims. For instance, a visual AI system can indicate the seriousness of damage to cars after accidents or assess the wear and tear of various parts such as suspension, tires, or gearboxes. The same goes with other types of insurance, including households, appliances, or even collectible & antique items.

Our platform is commonly utilized to develop recognition and detection systems for visual quality control & defect detection. Read more in the article Visual AI Takes Quality Control to a New Level.

Automatic Grading of Antique & Collectible Items Such as Sports Cards

Apart from car insurance and damage inspection, recognition and regression are great for all types of grading and sorting systems, for instance on price comparators and marketplaces of collectible and antique items. Deep learning is ideal for the automatic visual grading of collector items such as comic books and trading cards.

By leveraging visual AI technology, companies can streamline their processes, reduce manual labor significantly, cut costs, and enhance the accuracy and reliability of their assessments, leading to greater customer satisfaction.

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

Food Quality Estimation With AI

Biotech, Med Tech, and Industry 4.0 also have a lot of applications for regression models. For example, they can estimate the approximate level of fruit & vegetable ripeness or freshness from a simple camera image.

The grading of vegetables by an image regression model.

For instance, this Japanese farmer is using deep learning for cucumber quality checks. Looking for quality control or estimation of size and other parameters of olives, fruits, or meat? You can easily create a system tailored to these use cases without coding on the Ximilar platform.

Build Custom Evaluation & Grading Systems With Ximilar

Ximilar provides a no-code visual AI platform accessible via App & API. You can log in and train your own visual AI without the need to know how to code or have expertise in deep learning techniques. It will take you just a few minutes to build a powerful AI model. Don’t hesitate to test it for free and let us know what you think!

Our developers and annotators are also able to build custom recognition and regression systems from scratch. We can help you with the training of the custom task and then with the deployment in production. Both custom and ready-to-use solutions can be used via API or even deployed offline.

How do custom projects work?

The post Predict Values From Images With Image Regression appeared first on Ximilar: Visual AI for Business.

Ximilar Introduces a Brand New App

Zuzana Raidová — Mon, 06 Dec 2021 11:06:53 +0000

An update is never late, nor is it early. It arrives precisely when we mean it to. After tuning up the back end for four years, the time has come to level up the front end of our App as well. We tested multiple ways, got valuable feedback from our users, and now we’re happy to introduce a new interface. It is more user-friendly, there are richer options, and the orientation in the growing number of our services is easier.

All Important Things at Hand

Ximilar provides a platform for visual AI, where anyone can create, train and deploy custom-made visual AI solutions based on the techniques of machine learning and computer vision. The platform is accessible via API and a web-based App, where users from all around the world work with both ready-to-use and custom solutions. They implement them into their own apps, quality control or monitoring systems in factories, healthcare tools and so on.

We created the new interface to adapt to the ever-increasing number of services we provide. It now makes better use of both the dashboard and sidebar, showcases useful articles and guides, and provides more support. So, let’s take a look at the major new features!

Service Categories & News

We grouped our services based on how they work with data and the degree of possible customization. After you log into the application, you will see the cards of four service groups with short descriptions on the dashboard. Below them, you can see the newest articles from our Blog, where we publish a lot of useful tips on how to create and implement custom visual AI solutions.

The service groups are following:

Ready-to-use Image Recognition includes all the services, that you can use straight away without the need for additional training, custom tags and labels. In principle, these services analyze your data (i.e., your image collection) and provide you with information based on image recognition, object detection, analysis of colors & styles etc. Here you will find Fashion Tagging, Home Decor Tagging, Photo Tagging and Dominant Colors.
Custom Image Recognition allows you to train custom Categorization & Tagging and Object Detection models. Flows, that enable you to combine the models, are also under this category. To prepare the training data for object detection seamlessly and fast, you can use our own tool Annotate.
Visual Search encompasses all services able to identify, analyze and compare visually similar content. Image Similarity can find, compare and recommend visually similar images or products. You can also use Image Matching to identify duplicates or near-duplicates in your collection, or create a fully custom visual search. Fashion Search is a complex service based on visual search and fashion tagging for apparel image collections.
Image Tools are online tools based on computer vision and machine learning that will when provided with an image, modify it. You can then either use the result or implement these image tools in your Flows. Here you will find Remove Background and Image Upscaler.

Do you want to learn more about AI and machine learning? Check the list of The Best Resources on Artificial Intelligence and Machine Learning.

Discover Services

Within the service groups, you can now browse all our services, including the ones that are not in your pricing scheme. Every service dashboard features a service overview and links to documentation, useful guides, case studies & video tutorials.

Do you want to know what you pay for when using our App? Check our article on API credit packs or the documentation.

Guides & Help at Hand

The sidebar underwent some major changes. It now displays all service groups and services. At the bottom, you will find the Guides & Help section with all necessary links to the beginner App Overview tutorial, Guides, Documentation & Contacts in case you need help.

How to make the most of a computer vision solution? Our guides are packed with useful tips & tricks, as well as first-hand experience of our machine learning specialists.

Customize the Sidebar With Favorites

Since each use case is highly specific, our users usually use a small group of services or only one service at a time. That is why you can now pin your most-used services as Favorites.

When you first log into the new front end, all of your previously used services will be marked as favourites. You can then choose which of them will stay on top.

What’s next?

This front-end update is just a first step out of many we’ve been working on. We focus on adding some major features to the platform, such as explainability, as well as custom image regression models. The Ximilar platform provides one of the most advanced Visual AI tools with API on the market, and you can test them for free. Nevertheless, the key to the improvement of our services and App are your opinions and user experience. Let us know what you think!

The post Ximilar Introduces a Brand New App appeared first on Ximilar: Visual AI for Business.

Ximilar Introduces API Credit Packs

Zuzana Raidová — Tue, 27 Apr 2021 15:34:49 +0000

In the year 2021, we are going to implement some major updates and add new features to our App. They should make the user experience more convenient and the work environment more customizable. The first new feature is the API Credit Packs, specifically created at your requests and suggestions. In this article, I briefly describe, what are the main benefits of API credit packs, and how to use them.

How API Credits Work

Imagine you upload a training image, create a recognition label, or send an image for recognition in our App. Every time you perform an operation like this, you send a request to our server using API. This request is called an API call.

To keep track of API calls and their requirements, each type of call corresponds to a certain number of API credits. Generally, all calls sending image data to our servers cost some API credits. The full list of operations with their API credit values is available in our documentation.

See API documentation

Your Monthly API Credits

Every user of the Ximilar App is provided with a monthly supply of API credits, depending on their pricing plan. This supply is renewed every month on the day they made the purchase of their plan. For example, if you purchase a Business plan on April 15th, your monthly supply will be restored on the 15th day of every consequent month.

The users with the Free pricing plan are provided with a monthly supply of API credits as well. Whether you use a paid or free plan, the unused API credits from your monthly supply are not transferred to the following month and expire.

See Pricing

Introducing API Credit Packs

Ximilar App users can now buy an unlimited number of API credits aside from their monthly supply, in the form of API credit packs. This option is available for all pricing plans, including the Free plan.

There are two major benefits of the API credit packs. First, credits from the packs are used only when your monthly supply of credits runs out. In this example, the user with the Business plan has already used all API credits from his monthly supply and the system automatically switched to using the API credit pack. On April 15th, his monthly credit balance will be renewed, and the system will switch back to the monthly supply.

Second, API credit packs have no expiration. Therefore, their balance passes to the next month. You can buy as many credit packs as you need. The credits will add up in the lower API credit bar.

Typical Uses for API Credit Packs

The credit packs cover both expected and unexpected system loads. There are more ways and situations in which they can help or serve as safety nets.

Get Your System Ready

Our users generally pick their pricing plan based on regular traffic on their websites. However, the initial service setup is more demanding, and it costs a lot of extra credits. In this case, you wouldn’t want to upgrade your pricing plan for the short period of higher workloads and then downgrade back to the plan suiting your long-term needs.

One-Time System Loads

As you could see in the example with a Business plan user, the number of API credits in the credit pack bar was twice as high as his monthly credit supply. It is common for our users to use an above-average number of credits from time to time – typically when they are expecting higher system loads than usual. For example, uploading more products and images, or adding a brand new collection, would mean withdrawing your monthly credit supply too soon. In such cases, API credit packs provide a cost-effective solution.

Safety Net in a Case of Higher Traffic

The credit packs also cover the situations of unpredicted system loads caused by third parties. For example, when your website is visited and the system is used by an unexpected number of customers in a short period.

This way, the credit packs provide a sort of safety net to make sure no service outages will occur on your side due to the sudden exhaustion of credits.

What if I Upgrade or Downgrade My Plan?

You can always upgrade or downgrade your pricing plan. When this happens, the credits from your previous plan’s monthly supply will add up to the monthly supply of your new plan. They will remain in the bar till the end of your old monthly subscription and will be used first. In addition, you can purchase as many credit packs as you need, and the credits from the packs will be used after both of your monthly supplies are exhausted.

Do you have any questions? We’re more than happy to talk.

Try our public demos

The post Ximilar Introduces API Credit Packs appeared first on Ximilar: Visual AI for Business.

Visual AI Takes Quality Control to a New Level

Michal Lukáč — Wed, 24 Feb 2021 16:08:27 +0000

Have you heard about The Big Hack? The Big Hack story was about a tiny probe (small chip) inserted on computer motherboards by Chinese manufacturing companies. Attackers then could infiltrate any server workstation containing these motherboards, many of which were installed in large US-based companies and government agencies. The thing is, the probes were so small, and the motherboards so complex, that they were almost impossible to spot by the human eye. You can take this post as a guide to help you navigate the latest trends of AI in the industry with a primary focus on AI-based visual inspection systems.

AI Adoption by Companies Worldwide

Let’s start with some interesting stats and news. The expansion of AI and Machine Learning is becoming common across numerous industries. According to this report by Stanford University, AI adoption is increasing globally. More than 50 % of respondents said their companies were using AI, and the adoption growth was greatest in the Asia-Pacific region. Some people refer to the automation of factory processes, including digitalization and the use of AI, as the Fourth Industrial Revolution (and so-called Industry 4.0).

AI adoption by industry and function [Source]

The data show that the Automotive industry is the largest adopter of AI in manufacturing, using heavily machine learning, computer vision, and robotics.
Other industries, such as Pharma or Infrastructure, are using computer vision in their production lines as well. Financial services, on the other hand, are using AI mostly in operations, marketing & sales (with a focus on Natural Language Processing – NLP).

AI technologies per industry [Source]

The MIT Technology Review cited the statement of a leading artificial intelligence expert Andrew Ng, who has been helping tech giants like Google implement AI solutions, that factories are AI’s next frontier. For example, while it would be difficult to inspect parts of electronic devices with our eyes, a cheap camera of the latest Android or iPhone can provide high-resolution images that can be connected to any industrial system.

Adopting AI brings major advantages, but also potential risks that need to be mitigated. It is no surprise that companies are mainly concerned about the cybersecurity of such systems. Imagine you could lose a billion dollars if your factory stopped working (like Honda in this case). Other obstacles are potential errors in machine learning models. There are techniques on how to discover such errors, such as the explainability of AI systems. As for now, the explainability of AI is a concern of only 19 % of companies so there is space to improve. Getting insight from the algorithms can improve the processes and quality of the products. Other than security, there are also political & ethical questions (e.g., job replacement or privacy) that companies are worried about.

This survey by McKinsey & Company brings interesting insights into Germany’s industrial sector. It demonstrates the potential of AI for German companies in eight use cases, one of which is automated quality testing. The expected benefit is a 50% productivity increase due to AI-based automation. Needless to say, Germany is a bit ahead with the AI implementation strategy – there are already several plans made by German institutions to create standardised AI systems that will have better interoperability, certain security standards, quality criteria, and test procedures.

Highly developed economies like Germany, with a high GDP per capita and challenges such as a quickly ageing population, will increasingly need to rely on automation based on AI to achieve GDP targets.
McKinsey & Company

Another study by PwC predicts that the total expected economic impact of AI in the period until 2030 will be about $15.7 trillion. The greatest economic gains from AI are expected in China (26% higher GDP in 2030) and North America.

What is Visual Quality Control?

The human visual system is naturally very selective in what it perceives, focusing on one thing at a time and not actually seeing the whole image (direct vs. peripheral view). The cameras, on the other hand, see all the details, and with the highest resolution possible. Therefore, stories like The Big Hack show us the importance of visual control not only to ensure quality but also safety. That is why several companies and universities decided to develop optical inspection systems engaging machine learning methods able to detect the tiniest difference from the reference board.

Motherboards by Super Micro [Source: Scott Gelber]

In general, visual quality control is a method or process to inspect equipment or structures to discover defects, damages, missing parts, or other irregularities in production or manufacturing. It is an important method of confirming the quality and safety of manufactured products. Optical inspection systems are mostly used for visual quality control in factories and assembly lines, where the control would be hard or ineffective with human workers.

What Are the Main Benefits of Automatic Visual Inspection?

Here are some of the essential aspects and reasons, why automatic visual inspection brings a major advantage to businesses:

The human eye is imprecise – Even though our visual system is a magnificent thing, it needs a lot of “optimization” to be effective, making it prone to optical illusions. The focused view can miss many details, our visible spectrum is limited (380–750 nm), and therefore unable to capture NIR wavelength (source). Cameras and computer systems, on the other hand, can be calibrated to different conditions. Cameras are more suitable for highly precise analyses.
Manual checking – Manual checking of the items one by one is a time-consuming process. Smart automation allows processing and checking more items and faster. It also reduces the number of defective items that are released to customers.
The complexity – Some assembly lines can produce thousands of various products of different shapes, colours, and materials. For humans, it can be very difficult to keep track of all possible variations.
Quality – Providing better and higher quality products by reducing defective items and getting insights into the critical parts of the assembly line.
Risk of damage – Machine vision can reduce the risk of item damage and contamination by a person.
Workplace safety – Making the work environment safer by inspecting it for potentially dangerous actions (e.g. detection of protection wearables as safety helmets in construction sites), inspection in radioactive or biohazard environments, detection of fire, covid face masks, and many more.
Saving costs – Labour work can be pretty expensive in the Western world.
For example, the average Quality control inspector salary in the US is about 40k USD. Companies consider numerous options when saving costs, such as moving the factories to other countries, streamlining the operations, or replacing the workers with robots. And as I said before, this goes hand in hand with some political & ethical questions. I think the most reasonable solution in the long term is the cooperation of workers with robotic systems. This will make the process more robust, reliable, and effective.
Costs of AI systems – Sooner or later, modern technology and automation will be common in all companies (Startups as well as enterprise companies). The adoption of automatic solutions based on AI will make the transition more affordable.

Where is Visual Quality Control Used?

Let’s take a look at some of the fields where the AI visual control helps:

Cosmetics – Inspection of beauty products for defects and contaminations, colour & shape checks, controlling glass or plastic tubes for cleanliness and rejecting scratched pieces.
Pharma & Medical – Visual inspection for pharmaceuticals: rejecting defective and unfilled capsules or tablets or the filling level of bottles, checking the integrity of items; or surface imperfections of medical devices. High-resolution recognition of materials.
Food Industry and Agriculture – Food and beverage inspection for freshness. Label print/barcode/QR code control of presence or position.

A great example of industrial IoT is this story about a Japanese cucumber farmer who developed a monitoring system for quality check with deep learning and TensorFlow.

Automotive – Examination of forged metallic parts, plastic parts, cracks, stains or scratches in the paint coating, and other surface and material imperfections. Monitoring quality of automotive parts (tires, car seats, panels, gears) over time. Engine monitoring and predictive autonomous maintenance.
Aerospace – Checking for the presence and quality of critical components and material, spotting the defective parts, discarding them, and therefore making the products more reliable.
Transportation – Rail surface defects control (example), aircraft maintenance check, or baggage screening in airports – all of them require some kind of visual inspection.
Retail/Consumer Goods & Fashion – Checking assembly line items made of plastics, polymers, wood, and textile, and packaging. Visual quality control can be deployed for the manufacturing process of the goods. Sorting imprecise products.
Energy, Mining & Heavy Industries – Detecting cracks and damage in wind blades or solar panels, visual control in nuclear power plants, and many more.

It’s interesting to see that more and more companies choose collaborative platforms such as Kaggle to solve specific problems. In 2019, the contest by Russian company Severstal on Kaggle led to tens of solutions for the steel defect detection problem.

Steel defects [Source: Kaggle]

Other, e.g. safety checks – if people are present in specific zones of the factory if they have helmets, or stopping the robotic arm if a worker is located nearby.

The Technology Behind AI Quality Control

There are several different approaches and technologies that can be used for visual inspection on production lines. The most common nowadays are using some kind of neural network model.

Neural Networks – Deep Learning

Neural Networks (NN) are computational models that accept the input data and output relevant information. To make the neural network useful (finding the weights for the connection between the neurons and layers), we need to feed the network with some initial training data.

The advantage of using neural networks is their power to internally represent training data which leads to the best performance compared to other machine learning models in computer vision. However, it brings challenges, such as computational demands, overfitting, and others.

[Un|Semi|Self] Supervised Learning

If a machine-learning algorithm (NN) requires ground truth labels, i.e. annotations, then we are talking about supervised learning. If not, then it is an unsupervised method or something in between – semi or self-supervised method. However, building an annotated dataset is much more expensive than simply obtaining data with no labels. The good news is that the latest research in Neural Networks tackles problems with unsupervised learning.

On the left is the original item without any defects, on the right, a bit damaged one. If we know the labels (OK/DEFECT), we can train a supervised machine-learning algorithm. [Source: Kaggle]

Here is the list of common services and techniques for visual inspection:

Image Recognition – Simple neural network that can be trained for categorization or error detection on products from images. The most common architectures are based on convolution (CNN).
Object Detection – Model able to predict the exact position (bounding box) of specific parts. Suitable for defect localization and counting.
Segmentation – More complex than object detection, image segmentation can tell you a pixel-based prediction.
Image Regression – Regress/get a single decimal value from the image. For example, getting the level of wear out of the item.
Anomaly Detection – Shows which image contains an anomaly and why. Mostly done by GAN or GRAD-CAM.
OCR – Optical Character Recognition is used for getting and reading text from images.
Image matching – Matching the picture of the product to the reference image and displaying the difference.
Other – There are also other solutions that do not require data at all, most of the time using some simple, yet powerful computer vision technique.

If you would like to dive a bit deeper into the process of building a model, you can check my posts on Medium, such as How to detect defects on images.

Typical Types and Sources of Data for Visual Inspection

Common Data Sources

Thermal imaging example [Source: Quality Magazine]

RGB images – The most common data type and the easiest to get. A simple 1080p camera that you can connect to Raspberry Pi costs about 25$.

Thermography – Thermal quality control via infrared cameras, mostly used to detect flaws not visible by simple RGB cameras under the surface, gas imaging, fire prevention, and electronics behaviour under different conditions. If you want to know more, I recommend reading the articles in Quality Magazine.

3D scanning, Lasers, X-ray, and CT scans – Creating 3D models from special depth scanners gives you a better insight into material composition, surface, shape, and depth.

Microscopy – Due to the rapid development and miniaturization of technologies, sometimes we need a more detailed and precise view. Microscopes can be used in an industrial setting to ensure the best quality and safety of products. Microscopy is used for visual inspection in many fields, including material sciences and industry (stress fractures), nanotechnology (nanomaterial structure), or biology & medicine. There are many microscopy methods to choose from, such as stereomicroscopy, electron microscopy, opto-digital or purely digital microscopes, and others.

Common Inspection Errors

scratches
patches
knots, shakes, checks, and splits in the wood
crazing
pitted surface
missing parts
label/print damage
corrosion
coating nonuniformity

Surface crazing and cracking on brake discs [source], crazing in polymer-grafted nanoparticle film [source], and wood shakes [source].

Examples of Datasets for Visual Inspection

Severstal Kaggle Dataset – A competition for the detection of defects on flat sheet steel.
MVTec AD – 5000 high-resolution annotated images of 15 items (divided into defective and defect-free categories).
Casting Dataset – Casting is a manufacturing process in which a liquid material is usually poured into a form/mould. About 7 thousand images of submersible pump defects.
Kolektor Surface-Defect Dataset – Dataset of microscopic fractions or cracks in electrical accumulators.
PCB Dataset – Annotated images of printed circuit boards.

AI Quality Control Use Cases

We talked about a wide range of applications for visual control with AI and machine learning. Here are three of our use cases for industrial image recognition we worked on in 2020. All these cases required an automatic optical inspection (AOI) and partial customization when building the model, working with different types of data and deployment (cloud/on-premise instance/smartphone). We are glad to hear that during the COVID-19 pandemic, our technologies help customers keep their factories open.

Our typical workflow for a customized solution is the following:

Setup, Research & Plan: If we don’t know how to solve the problem from the initial call, our Machine Learning team does the research and finds the optimal solution for you.
Gathering Data: We sit with your team and discuss what kind of data samples we need. If you can’t acquire and annotate data yourself, our team of annotators will work on obtaining a training dataset.
First prototype: Within 2–4 weeks we prepare the first prototype or proof of concept. The proof of concept is a lightweight solution for your problem. You can test it and evaluate it by yourself.
Development: Once you are satisfied with the prototype results, our team can focus on the development of the full solution. We work mostly in an iterative way improving the model and obtaining more data if needed.
Evaluation & Deployment: If the system performs well and meets the criteria set up in the first calls (mostly some evaluation on the test dataset and speed performance), we work on the deployment. It can be used in our cloud, on-premise, or embedded hardware in the factory. It’s up to you. We can even provide a source code so your team can edit it in the future.

Use case: Image recognition & OCR for wood products

One of our customers contacted us with a request to build a system for categorization and quality control of wooden products. With Ximilar Platform we were able to easily develop and deploy a camera system over the assembly line that sorted the products into the bins. The system can identify the defective print on the products with optical character recognition technology (OCR), and the surface control of wood texture is enabled by a separate model.

Printed text on wood [Source: Ximilar]

The technology is connected to a simple smartphone/tablet camera in the factory and can handle tens of products per second. This way, our customer was able to reduce rework and manual inspections which led to saving thousands of USD per year. This system was built with the Ximilar Flows service.

Use case: Spectrogram analysis from car engines

Another project we successfully deployed was the detection of malfunctioning engines. We did it by transforming the sound input from the car into an image spectrogram. After that, we train a deep neural network that recognises problematic car engines and can tell you the specific problem of the engine.

The good news is that this system can also detect anomalies in an unsupervised way (no need for data labelling) with the GAN technology.

Spectrogram from Engine [Source: Ximilar]

Use case: Wind Turbin Blade damages from drone footage

[Source: Pexels]

According to Bloomberg, there is no simple way to recycle a wind turbine, and it is therefore crucial to prolong the lifespan of wind power plants. They can be hit by lightning, influenced by extreme weather, and other natural forces.

That’s why we developed for our customers a system checking the rotor blade integrity and damages working with drone video footage. The videos are uploaded to the system, and inspection is done with an object detection model identifying potential problems. There are thousands of videos analyzed in one batch, so we built a workstation (with NVidia RTX GPU cards) able to handle such a load.

Ximilar Advantages in Visual AI Quality Control

An end-to-end and easy-to-use platform for Computer Vision and Machine Learning, with enterprise-ready features.
Processing hundreds of images per second on an average computer.
Train your model in the cloud and use it offline in your factory without an internet connection. Thanks to TensorFlow, you can use the model on any computer, edge device, GPU card, or embedded hardware (Raspberry Pi or NVIDIA Jetson connected to a camera). We also provide optimized CPU models on Intel devices through OpenVINO technology.
Easily gather more data and teach models on new defects within a day.
Evaluation of the independent dataset, and model versioning.
A customized yet affordable solution providing the best outcome with pixel-accurate recognition.
Advanced image management and annotation platform suitable for creating intelligent vision systems.
Image augmentation settings that can be tuned for your problem.
Fast machine learning models that can be connected to your industrial camera or smartphone for industrial image processing robust to lighting conditions, object motion, or vibrations.
Great team of experts, available to communicate and help.

To sum up, it is clear that artificial intelligence and machine learning are becoming common in the majority of industries working with automation, digital data, and quality or safety control. Machine learning definitely has a lot to offer to the factories with both manual and robotic assembly lines, or even fully automated production, but also to various specialized fields, such as material sciences, pharmaceutical, and medical industry.

Are you interested in creating your own visual control system?

How do custom projects work?

The post Visual AI Takes Quality Control to a New Level appeared first on Ximilar: Visual AI for Business.

The Year 2020 at Ximilar

Michal Lukáč — Tue, 12 Jan 2021 09:22:52 +0000

Well, the year 2020 will stay a long time in our memory. For many, it was a sad year. For those in the online business, the Covid-19, aka Coronavirus, accelerated online shopping like nothing before. Smart retail becomes an even more critical part of a successful company. And Ximilar team was working harder than before to help with that.

Object Detection — Customized

For some of our customers, we already had the opportunity to train custom detection models. However, we decided to integrate this into the popular app.ximilar.com and not only keep it in our Annotate tool.

The biggest news is that we re-implemented a new architecture for Object Detection (called CenterNet), using TensorFlow 2+. And made it open-source for you guys out there. The system is more scalable and faster than before, and we still have multiple ideas for further improvements. Creating detection models has so far a long and painful process, so we believe that you will love the new service as it significantly speeds up the workflow. We’ve also cloned favourite features from Image Recognition, so you can now configure your custom image augmentation settings, model versioning, download models for offline usage, evaluation on an independent test dataset, connect it to the flow, and more. It was a ride! More in the video ↓

Flows

Flows are a service that simplifies the process of building computer vision systems.

We spent a lot of man-hours on this feature. Ximilar is the only system in the market that can connect visual AI models to complex workflow without any coding. Before we had Flows, it was really painful to connect individual recognition or detection models into one API service. Now it is effortless, and we again have more features coming.

Flows are incredibly powerful with endless possibilities. Saving the costs of expensive machine learning development. You should see the happy faces of Flows users.

Imagine that you are creating a car monitoring system over the parking place. You can create detection of parking spots and then analyze individual spots with image recognition models and decide if it is used or not. With Flows, you just connect several actions and the system is automatically deployed for you! This is something that takes a long time for a team of engineers to develop, but with Ximilar Flows, the entire process can be done in just minutes.

Fashion Tagging and Search Improved

With custom object detection and Flows, we were able to build one of the best systems for visual apparel analysis (Clothing, Underwear, Footwear, Jewelry, Watches, Accessories, Bags, Hats, Glasses, etc.). The entire system consists of a hundred models integrated into one flow. The system is also able to detect individual clothing items and can tell you more information about the background and particular view or detail. Ximilar is ready to create a custom profile for you with the customization of tag names. And not only that.

Because some of our customers have all of our three services for Fashion: Tagging, Similarity, and Visual Search, we have also created a full-featured Fashion Search which includes all of those 3. This could save you a lot of money and provide the fastest solution for your e-commerce outlet or mobile app.

Some use cases require image analysis on a deeper level. We call such a service META image analysis. That is great for automated decisions such as:

is there a person in the photo?
is the product dominant in the image?
what kind of background colour is there?
are there additional details of the item to see?
is this a front or a side view of the object in the photo?

Building a high-quality image tagging system for any other field is very effective on the Ximilar platform (real estate, stock photos, product categorization, etc). As an example, we have pre-built visual AI for home decor products right in the app.

New: Custom Image Similarity

Would you like to build a visual search engine for example for skincare products, but you don’t know how? Check out the web of our customer Skintory. We have taken this challenge to create Visual Search/Image Similarity as a service. So you can train custom similarity models and create visual product searches on click.

This could really change how we interact and search on retail sites. For most of the use cases, the generic similarity model trained on stock photos is not working as well as it could. So we took the opportunity to work on a service for the training of customized image similarity models with an integrated search engine.

That feature is still in BETA, and our AI team is working hard to deliver the best experience soon. If you would like to test this service, please contact us directly.

Multiple App Improvements

We have not abandoned all our now older services, such as Image Recognition. There are new features and improvements. The most visible additions are these:

complex management of your image data
new services in the Ximilar App and one-click login with social networks
evaluation of models on the independent test dataset
improved advanced options for better image augmentation
moving to TensorFlow 2, adding export of offline models on edge devices, and more

Annotate

It is our internal image annotation system, which was released to the public at the beginning of the year 2020. It offers a workflow for teams (annotation jobs), multiple verifications of your image data, and more. If you are working on a large machine learning project that requires a balanced and high-quality dataset → ask us to show you Annotate. We are constantly working on improvements, so your team can deliver your final product faster with higher accuracy. Read about image annotation for teams.

Cooperations & Partnership

Thanks to Intel, our prediction system is running faster than ever. Our AI and Backend engineers work together with the Intel team (thank you, Ellen and Vishnu) on the optimization of machine learning models on the x86 architecture with OpenVINO. We changed the backend of our models from TF2 to OpenVINO and got a massive speed-up of performance. Now our entire fashion system is not only the most accurate but also the fastest one. Of course, you can build any other visual inspection application on top of the Ximilar Platform. We have published an in-depth behind the scenes.

In the middle of 2020, we also became a member of the NVIDIA Inception Program, which supports cutting-edge AI startups that are revolutionizing industries. We are looking forward to being active in the area, thanks to the Brno.AI platform, which supports companies and universities in Czechia.

Open Source Projects

Our tech team was active quite a lot in the open-source community. Implementing and publishing machine learning projects:

xCenterNet — fast object detection model
tf-image — image augmentation for TensorFlow that makes your model more robust
TF-metric-learning — distance metric learning library

Tutorial videos

Lastly, Ximilar released video tutorials for the Platform. With the support of JIC Brno. A first series of educational videos.

And more in 2021!

In the year of 2020, we have grown on all levels. We have more happy customers and a bigger team. It was not easy to manage everything during the early Covid-19 era. Some of us spent a lot of time at home office. Luckily, we have a great team that supports our idea of making machine learning and computer vision a pleasant and creative process.

We are working on improving stable services as well as creating new projects. There are many interesting topics that we plan to probe such as Custom similarity, Explainability/Interpretability, Regression from image, a combination of image and text data, a model for background removal, image super-resolution service, and many more. Stay tuned & healthy for next year at least!

If you have any ideas that you would love to have on the platform, then please let us know.

Try our public demos

The post The Year 2020 at Ximilar appeared first on Ximilar: Visual AI for Business.

Behind The Scene: Ximilar and Intel AI Builders Program

Libor Vanek — Thu, 17 Dec 2020 12:12:11 +0000

When Ximilar was founded, many crucial decisions needed to be made. One of them was where to run our software. After careful consideration, our first servers were bought and subsequently, all our services have run on our own hardware.

This brought of course a great number of challenges, but also many opportunities. We would like to share with you some of the latest accomplishments. Mainly, we are proud to announce that we have become a part of Intel AI Builders Program. Having direct access to technical resources and support from Intel helps us in using our hardware in the best way possible.

Running the entire MLaaS (Machine learning as a service) platform on our cloud infrastructure as efficiently as possible and delivering high-quality results is essential for us.

Making The Most of It

Knowing our servers enables us to optimize used models to be very efficient. It is not hard to get incredible amounts of computational power today, but we still believe that it is worth using them to their fullest potential. We love saving energy, and resources, and at the same time providing our services for reasonable prices.

Machine learning computations are performed both by GPUs (graphics processing units) and CPUs (central processing units). During training, the CPU prepares the images, and the GPU optimizes the artificial neural network. For prediction, both of them can be used. But generally, we use GPU only when the model is huge, or we need the results very fast. In other cases, the CPU is sufficient. Recently, we focused on optimizing our prediction on our Intel Xeon CPUs to make them run faster.

We use the TensorFlow library for your machine-learning models. It was developed by Google and released as open-source in 2015. Google also came up with their own hardware for machine learning – Tensor Processing Units, TPUs in short. Nevertheless, they are still interested in other hardware platforms as well, and they provide XNNPACK, an optimized library of floating-point neural network inference operators.

OpenVINO Toolkit

However, we achieved the best results by using the OpenVINO Toolkit from Intel. It consists of two basic parts. A model optimizer, which is run after a model is trained, and an inference engine for predictions. We are able to speed up some of our tasks by 5 times. The predictions run especially efficiently on larger images and larger batches.

So far, we are using OpenVINO for our image recognition service and various recognition tasks in different fashion services. For example, fashion tagging or visual search runs many models on a single image and every improvement in efficiency is even more noticeable. But we are not forgetting the rest of our portfolio, and we are working hard to extend it to all the different services we provide.

Last but not least, we would like to thank Intel for their support, and we are looking forward to our cooperation in the next years!

Do you want to know more? Read a blog on the Intel AI website.

The post Behind The Scene: Ximilar and Intel AI Builders Program appeared first on Ximilar: Visual AI for Business.