Software as a Service - Ximilar: Visual AI for Business

We Introduce Plan Overview & Advanced Plan Setup

Zuzana Raidová — Tue, 24 Sep 2024 13:58:05 +0000

We’re excited to introduce new updates to Ximilar App! As a machine learning platform for training and deploying computer vision models, it also lets you manage subscriptions, monitor API credit usage, and purchase credit packs.

These updates aim to improve your experience and streamline plan setup and credit consumption optimization. Here’s a quick rundown of what’s new.

Plan Setup: Simplified Subscription Management

We’ve revamped the subscription page with new features and better functionality. The Plan Setup page now allows you to choose between Free, Business, or Professional plans, customize your monthly credit supply using a slider, and access our new API Credit Consumption Calculator—a handy tool to help you make informed decisions.

Plan setup in Ximilar App.

The entire checkout process has been streamlined as well, allowing you to adjust your payment method directly before completing your purchase.

Go to Plan setup

Explore Pricing plans

Manage Your Payment Methods and Currencies

You can change the default currency for plan setup and payments in the Settings. To update your payment method, simply access the Stripe Portal from your Plan Overview under “More Actions.” If you prefer a different payment method or have any additional questions, feel free to reach out to us!

Credit Calculator: Estimate & Optimise Your Credit Consumption

One of the most exciting additions to the app is the new Credit Calculator, now available directly within the platform. While this tool was previously featured on our Pricing page, it’s now integrated into the app as well, allowing you to not only estimate your credit needs but also preset your subscription plan directly from the calculator.

Once you’ve adjusted your credits based on projected usage, you can proceed straight to checkout, making the entire process of optimizing and purchasing credits smoother and more efficient.

Credit consumption calculator in Ximilar App.

Calculator in App

Calculator at Pricing page

Plan Overview: A Complete View of Your Plans and Credits

The page Plan Overview gives you a comprehensive view of your active subscription, any past plans, and your pre-paid credit packs. Previously, credit information was limited to your dashboard, but now you have detailed insight into your credit usage and plan history.

Plan overview in Ximilar App.

In the Plan Overview, you can view all your current active subscription plans. If you upgrade or downgrade, multiple plans may temporarily appear, as credits from your previous plan remain available until the end of the billing period.

Go to Plan overview

Reports: Detailed Insights into Credit Usage

Our new Reports page enables you to gain deeper insights into your API credit usage. It provides two types of reports: credit consumption by AI solution (e.g., Card Grading) and by individual operation within a solution (e.g., “grade one card” within the Card Grading solution).

Reports in Ximilar App give you detailed insight into your API credit consumption.

See Reports

Credit Packs: Flexibility to Buy Extra Credits Anytime

API Credit packs act as a safety net for unexpected system loads. Now available on their dedicated page, you can purchase additional API credit packs as needed. You can also compare pricing against higher subscription plans and choose the most cost-effective option. Both your active and used credit packs will be displayed on the Plan Overview page.

API Credit packs page in Ximilar App.

Go to Credit packs

Invoices: All Your Purchases in One Place

This updated page neatly lists all your invoices, including both subscription payments and one-time credit pack purchases, ensuring that all your financial information is in one place.

Invoices in Ximilar App.

Go to Invoices

Greater Control & Flexibility For the Users

These updates are designed to provide you with greater control, transparency, and flexibility as you build and deploy visual AI solutions. All of these features are now accessible in your sidebar. Check them out, and feel free to reach out with any questions!

The post We Introduce Plan Overview & Advanced Plan Setup appeared first on Ximilar: Visual AI for Business.

New AI Solutions for Card & Comic Book Collectors

Zuzana Raidová — Wed, 18 Sep 2024 12:35:34 +0000

Recognize and Identify Comic Books in Detail With AI

The newest addition to our portfolio of solutions is the Comics Identification (/v2/comics_id). This service is designed to identify comics from images. While it’s still in the early stages, we are actively refining and enhancing its capabilities.

The API detects the largest comic book in an image, and provides key information such as the title, issue number, release date, publisher, origin date, and creator’s name, making it ideal for identifying comic books, magazines, as well as manga.

Comics Identification by Ximilar provides the title, issue number, release date, publisher, origin date, and creator’s name.

This tool is perfect for organizing and cataloging large comic collections, offering accurate identification and automation of metadata extraction. Whether you’re managing a digital archive or cataloging physical collections, the Comics Identification API streamlines the process by quickly delivering essential details. We’re committed to continuously improving this service to meet the evolving needs of comic identification.

Try how it works

Learn more

Star Wars Unlimited, Digimon, Dragon Ball, and More Can Now Be Recognized by Our System

Our trading card identification system has already been widely used to accurately recognize and provide detailed information on cards from games like Pokémon, Yu-Gi-Oh!, Magic: The Gathering, One Piece, Flesh and Blood, MetaZoo, and Lorcana.

Recently, we’ve expanded the system to include cards from Garbage Pail Kids, Star Wars Unlimited, Digimon, Dragon Ball Super, Weiss Schwarz, and Union Arena. And we’re continually adding new games based on demand. For the full and up-to-date list of recognized games, check out our API documentation.

Ximilar keeps adding new games to the trading card game recognition system. It can easily be deployed via API and controlled in our App.

Try how it works

See the full taxonomy

Detect and Identify Both Trading Cards and Their Slab Labels

The new endpoint slab_grade processes your list of image records to detect and identify cards and slab labels. It utilizes advanced image recognition to return detailed results, including the location of detected items and analyzed features.

Graded slab reading by Ximilar AI.

The Slab Label object provides essential information, such as the company or category (e.g., BECKETT, CGC, PSA, SGC, MANA, ACE, TAG, Other), the card’s grade, and the side of the slab. This endpoint enhances our capability to categorize and assess trading cards with greater precision. In our App, you will find it under Collectibles Recognition: Slab Reading & Identification.

Try how it works

Documentation

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

New Endpoint for Card Centering Analysis With Interactive Demo

Given a single image record, the centering endpoint returns the position of a card and performs centering analysis. You can also get a visualization of grading through the _clean_url_card and _exact_url_card fields.

The _tags field indicates if the card is autographed, its side, and type. Centering information is included in the card field of the record.

The card centering API by Ximilar returns the position of a card and performs centering analysis.

Try how it works

Documentation

Learn How to Scan and Identify Trading Card Games in Bulk With Ximilar

Our new guide How To Scan And Identify Your Trading Cards With Ximilar AI explains how to use AI to streamline card processing with card scanners. It covers everything from setting up your scanner and running a Python script to analyzing results and integrating them into your website.

Read the guide

Let Us Know What You Think!

And that’s a wrap on our latest updates to the platform! We hope these new features might help your shop, website, or app grow traffic and gain an edge over the competition.

If you have any questions, feedback, or ideas on how you’d like to see the services evolve, we’d love to hear from you. We’re always open to suggestions because your input shapes the future of our platform. Your voice matters!

The post New AI Solutions for Card & Comic Book Collectors appeared first on Ximilar: Visual AI for Business.

New Solutions & Innovations in Fashion and Home Decor AI

Zuzana Raidová — Wed, 18 Sep 2024 12:09:13 +0000

Automate Writing of SEO-Friendly Product Titles and Descriptions With Our AI

Our AI-powered Product Description revolutionizes the way you manage your fashion apparel catalogs by fully automating the creation of product titles and descriptions. Instead of spending hours manually tagging and writing descriptions, our AI-driven generator swiftly produces optimized texts, saving you valuable time and effort.

Ximilar automates keyword extraction from your fashion images, enabling you to instantly create SEO-friendly product titles and descriptions, streamlining the inventory listing process.

With the ability to customize style, tonality, format, length, and preferred product tags, you can ensure that each description aligns perfectly with your brand’s voice and SEO needs. This service is designed to streamline your workflow, providing accurate, engaging, and search-friendly descriptions for your entire fashion inventory.

Try how it works

Enhanced Taxonomy for Accessories Product Tagging

We’ve upgraded our taxonomy for accessories tagging. For sunglasses and glasses, you can now get tags for frame types (Frameless, Fully Framed, Half-Framed), materials (Combined, Metal, Plastic & Acetate), and shapes (Aviator, Cat-eye, Geometric, Oval, Rectangle, Vizor/Sport, Wayfarer, Round, Square). Try how it works on your images in our public demo.

Our tags for accessories cover all visual features from materials to patterns or shapes.

Try how it works

Learn more & get full taxonomy

Automate Detection & Tagging of Home Decor Images With AI

Our new Home Decor Tagging service streamlines the process of categorizing and managing your home decor product images. It uses advanced recognition technology to automatically assign categories, sub-categories, and tags to each image, making your product catalog more organized. You can customize the tags and choose translations to fit your needs.

Try our interactive home decor detection & tagging demo.

The service also offers flexibility with custom profiles, allowing you to rename tags or add new ones based on your requirements. For pricing details and to see the service in action, check our API documentation or contact our support team for help with custom tagging and translations.

Try how it works

Documentation

Visual Search for Home Decor: Find Products With Real-Life Photos

With our new Home Decor Search service, customers can use real-life photos to find visually similar items from your furniture and home decor catalogue.

Our tool integrates four key functionalities: home decor detection, product tagging, colour extraction, and visual search. It allows users to upload a photo, which the system analyzes to detect home decor items and match them with similar products from your inventory.

Our Home Decor Search tool suggests similar alternatives from your inventory for each detected product.

To use Home Decor Search, you first sync your database with Ximilar’s cloud collection. This involves processing product images to detect and tag items, and discarding the images immediately after. Once your data is synced, you can perform visual searches by submitting photos and retrieving similar products based on visual and tag similarity.

The API allows for customized searches, such as specifying exact objects of interest or integrating custom profiles to modify tag outputs. For a streamlined experience, Ximilar offers options for automatic synchronization and data mapping, ensuring your product catalog remains up-to-date and accurate.

Try how it works

Documentation

The post New Solutions & Innovations in Fashion and Home Decor AI appeared first on Ximilar: Visual AI for Business.

The Best Online Tools, Apps, and Services for Card Collectors

Michal Lukáč — Fri, 31 May 2024 12:47:48 +0000

Welcome to the ultimate guide for card collectors! This blog post explores online technologies for individual collectors, small shops, as well as big companies. Whether you are collecting or selling Trading Card Games (TCGs) or Sports Cards, or just looking for inspiration and new technologies, you’ll discover great tools to enhance your collecting experience with sports card recognition technology.

We’ll look at companies developing interesting technologies for sports card recognition. Online card grading solutions (both offline and online, with AI and human graders), marketplaces where you can list cards, scanner companies that automate identification in big warehouses, mobile apps for managing and valuing personal collections, platforms for card investors, and special vaults for storing precious items can all benefit from automated sports card recognition.

Scanners for Sports Card Recognition

Card scanners are becoming very popular and there are tens of companies around the world with different pros and cons, features, capacity, processing speed, and pricing or subscription plans. All of them want to solve the problem of unmanaged warehouses full of cards.

TCG Sync – TCGSync is a very interesting startup that offers tools for cards like scanners (in partnership with Fujitsu), card catalogizing, inventory management, auto pricing and more. They support a lot of TCG types and have their own shop for card scanners that are ready for use. If you sign up for their yearly plan they will even give you a Fujitsu scanner for free and you can start selling on eBay, Shopify or Card Market from day 1.

If you already have a scanner and want to just identify the cards, you can connect to our API service with tools that will do it for you. Read more in my article How to Identify Sports Cards With AI.

Card Dealer Pro – This site is very similar to TCG Sync, but focused on sports cards. You just feed the cards to your scanner, Card Dealer will identify them with AI, propose a price for listing with title and description and publish them to eBay, Shopify or CollX.

Scanners can help with sports card recognition (Source: ricoh.com).

Krono cards – This tool by Kronozio is for scanning and documenting your card inventory in bulk. It’s very similar to the TCG Sync and Card Dealer Pro. However, when you scan a card with a Krono Card, they directly submit it to their own marketplace. This can be a great advantage if you don’t have your own shop and a disadvantage if you don’t want actually to populate their database.

These are three main, but you can also explore several other options. For instance, TCG Machines is a Canadian company providing its own machine. Roca Sorter by TCG Player focuses on the four main TCGs: Magic the Gathering, Yu-Gi-Oh!, Pokémon, and Lorcana. CardCastle is an Australian company with its own scanner and platform to organize your collection. SortSwift is a system for managing your hobby store using the Ricoh scanner.

Sports Card Recognition & Evaluation With Smartphone Apps

Smartphone apps are really useful when you want to just check the card, its condition, or price on the internet. Here are the most popular ones that millions of collectors around the world use daily.

Ludex – A simple card scanner app that helps you identify the sports cards and get the prices using sports card recognition. After that, you can list it on eBay with a few clicks. In their free plan, you can scan up to 200 cards monthly. The plans are currently for 4.99, 9.99 and 24.99 USD enabling more functionalities like customized collections.

CollX – CollX is very similar to Ludex. With this smartphone application, you can snap a photo of your card and get its value in seconds. It’s the most popular smartphone app for card recognition and collection management. Also, the community is pretty active and you can easily submit a card to their own marketplace. It can tell the actual marketplace value and find similar listings.

Cardstock: Price Sports Cards: This app by Cardstock helps you with the identification and valuing of your cards. It’s designed for iPad and it can analyze baseball cards with great accuracy. This app is great for individual collectors, go and try it!

Cardstock enables you to scan your card and match it with their database. You approve the identification or select the right match in the variations.

Collectr – Collectr is a great application for TCGs, that updates the value of the cards daily. You can manage your inventory and see the total invested value in the cards. I’m using the APP myself and the card scanning technology works great. My portfolio of cards is growing on value – it’s quite addictive

Sports Card Investor – It’s a great website with a smartphone app for everyone interested in investing in sports cards. You can see which cards are trending, you can search cards by complex queries, view recent sale prices and look at how the card is trending over the past. There are a lot of articles, resources, tips and also a very active community on Discord and social networks.

PSA Set Registry – This enables tracking your inventory of PSA-graded cards, seeing the populations and updating your own sets/collections. Basically, it is a gamification of collecting: you can compare your collection to others and compete on the leaderboard with your cards or get achievements or awards for collectors.

Online and Offline Card Grading Services

Card Grading is submitting and evaluating the quality and origin of the cards by third-party service with final sealing cards to the slabs. The grading usually increases the value of the cards (by demand) and makes the cards protected.

PSA, Beckett & CGC – There are several standard grading companies and the most popular are the PSA, Beckett and CGC. The PSA has the largest market share and the cards they grade generally have higher value than those from other companies. CGC first started with the comics but they are also doing cards now with currently refreshed labels.

Ace Grading – Ace Grading is a company from the UK with really cool slab labels. At this moment, they focus mostly on TCGs like Pokemons, Lorcana, or Magic The Gathering. This is a very good option for non-US citizens. The pricing is transparent with great support.

Ace Grading’s slab labels. (Source: acegrading.com)

Tag Grading – Sometimes human graders can make a big mistake or can be very subjective during the grading process. Submitting to standard companies like PSA or Beckett can be very shady. That is why some companies try to develop grading based on computer vision. TAG Grading is a startup that develops its own technology for card grading. They use a scanner and AI models that can grade a card with accuracy, transparency and consistency. When TAG grades your card, you also get the grading report with an explanation of the grade. I think this is the way the grading should be done in future.

There are several online tools that you can use as individual collectors. For example, EdgeGrading provides a great web tool for getting the centering score. You simply upload the image of your card and adjust the Left / Right and Top / Bottom offsets.

SportscardsPro is also offering a centring web-based tool. However, the card photo cannot be scanned and there must be some background around the card. TrueGrade on the other hand is a smartphone app that grades cards based on the evaluation of centering, corners, edges and surfaces. A website alternative to TrueGrade is TCGrader – an AI-powered Pokemon grading system.

We built several useful tools for AI-powered sports card recognition, trading card game identification, card grading, and search. Read more in the articles and let me know what you think.

Online Platforms for Collectors

There are several platforms that can help you manage your collection, connect with the community, price the cards, sell them and much more.

Card Ladder – Card Ladder (by collectors.com) is a great platform for finding the value of your cards – including historical prices from several marketplaces like eBay, Goldin, Heritage or MySlabs and population reports. It also offers complex analytics and can track your collection. Personally, I like features like the charts of historical prices for my collection or notifications (price alerts) when some cards hit specified prices. They offer a free trial. Then, you need to subscribe for 15 USD/Month.

CardLadder helps collectors find the value of their trading cards.

Price Charting – In my opinion, this is an awesome website, not only for trading cards but for comic books, video games, lego sets and coins as well. It offers you a search-by-photo functionality for selected card types/games. Our favourite function is an API for finding the value of ungraded or graded cards. The value for graded cards can be categorized per grade which many collectors consider to be the best source for price identification.

Collect i bles.com – Collectibles are a quite new project with several interesting features. They have their own mobile application for iOS and Android. You can create your collection of cards, coins, stamps, memorabilia or comic books (with a showcase feature). The mobile app can scan and identify the items via AI-enhanced image recognition and add them to your collection. You can track the value of your collection so you can get better insights. Moreover, it has an active community where you can connect with other collectors, which is a big plus.

Collectors.com – Collectors are currently one of the largest market players (it is a site of Collectors Holdings, Inc.) when it comes to sports cards and trading card games. It has several divisions, one of them is a popular card grading company PSA. It also acquired companies like Wata Games. Currently, their app helps with managing your collections, selling them or sending them to PSA Vault. The PSA Vault is a cool service that helps you securely store your collector’s items with the opportunity to publicly sell them on their marketplace or on eBay.

Cardbase – Lastly, Cardbase is a platform designed for trading card enthusiasts to search, discover, and manage their collections. It aggregates prices and availability from over 30 marketplaces and auction houses, allowing users to track card values, view price trends, and find deals. Key features include comprehensive collection management tools, price tracking, and a mobile app for on-the-go access. Additionally, the site provides useful articles, guides, and resources for collectors.

Collectibles Marketplaces

If you are looking to sell your cards there are big sites as well as smaller ones, specialising in sports cards and TCGs. In general, you can always sell your card on eBay, but if you have a really expensive card then maybe you can try some auction house.

Sportlots.com – This is an amazing marketplace where you can get low-end sports cards very cheaply. In total, it lists more than 80 million cards from over 1000 sellers. The website has a kind of 90’s vibe but it has a lot of reputable sellers. Also, you can save a lot with their box system. That means that during the checkout, you can ship cards to your personal box and once you gather a good amount of cards you can ship them at once.

COMC.com – Check Out My Collectibles is a large marketplace and auction for all the card types. If you have more than a few hundred sports cards you should probably try to sell them via this marketplace.

Goldin.co – Goldin is a well-known auction house (goldin was acquired by eBay) that specializes in sports memorabilia, trading cards and sports cards. They are hosting high-profile auctions featuring rare and valuable items. The site is so popular that the founder Ken Goldin was featured in their own Netflix series King of Collectibles, The Goldin Touch. The Goldin is also a marketplace with tens of thousands of listings. Similar to the Goldin marketplace there is also PWCC which offers auctions, vaults and a marketplace.

The Most Popular Marketplaces

Cardmarket – This is originally a German company that offers a marketplace for your cards. It’s also the most popular marketplace in Europe. Just sign up and you can sell your singles, booster boxes or sealed products in minutes. It is very similar to eBay (each seller has a profile with reviews) but specialized in games like Pokemon, Dragon Ball, One Piece and others.

TCG Player – This is also one of the most popular marketplaces for selling trading card games (seller accounts), their marketplace supports a large number of games. The site was acquired by eBay in 2022. It has a lot of features, a mobile app, inventory management, and great customer support. They also offer developer tools like API for knowing the price of the card.

Japanese Card Marketplaces

In some cases, the Japanese sites can be very useful because cards are very popular in Japan and it’s a big market for sellers and buyers. So I picked a few that you should check out.

Cardotaku.com – is quite a great site developed for getting Japanese variations of cards. Started as a one-man business, and its popularity is growing. For the Japanese version of Magic The Gathering, we recommend checking Hareruya and Bigweb.

On TCG Republic you can find cards from various games. In general, I would recommend also checking out classic eBay and Mercari.com with their trading cards and collectibles sections.

E-Commerce Platforms

Do you need your own e-commerce solution with inventory management and many other features? Then try one of these platforms.

BinderPOS – BinderPOS is a solution that can run on top of your Shopify store and help you with the collectibles inventory. Originally from New Zealand, it quickly raised popularity among game stores worldwide.

CrystalCommerce – This is an in-store & online e-commerce platform for collectibles. A very similar solution to BinderPOS which helps you sell the stuff to several sales channels (such as eBay, Amazon, TCGplayer, and others). It’s easy to set up and you can pick from several website themes.

Storepass – Storepass is marketed as software for board games and TCG stores. It’s a generic platform on top of your e-commerce site like Shopify or BigCommerce. You can automatically access TCG market prices from TCG Player, manage your product inventory, edit the cards in bulk, and much more.

Other Projects

Lastly, I want to mention several other interesting projects, which do not offer typical services but can be very helpful for individual collectors.

For Card Pricing & Shipment

Mavin.io and Card Mavin – Mavin is a search engine for collectibles, you can get insights into what your collectibles are worth. Similar to pricecharting, they are offering the API for developers. So you can simply get the actual and historical prices for cards, comics or coins.

ShipMyCards – Shipmycards is an interesting project that can become your tax-efficient storage facility with your own USA shipping address. The main business is in the cards but they also support vinyl records, magazines, comic books, memorabilia, or even shoes. In general, you will get your own US address where you send your orders from eBay or other marketplaces. They will help you with collecting, grading, insurance, and final shipment. Great for people outside North America.

For Magic the Gathering and Other TCGs

Card Conduit – Have you found your collection of Magic cards from your teenage times and do you want to sell them? The card conduit is a really smart way how to sell Magic The Gathering in a very transparent and easy way. You simply send your cards via postage and they will price them and sell them for you. You exactly know how much you get for each of the cards because they can automatically identify them and get the best price for your cards. This is a very nice tool with amazing support.

META TCG is a project similar to Card Conduit but focused on Pokémon, Magic The Gathering and Yu-Gi-Oh! You just send your bulk submission via the post office and you get your payments via PayPal.

To Keep Up With News & Stats

CardLines – CardLines is a website where collectors can get information and read news related to sports cards, trading cards and other collectibles. The articles are released daily and if you are an active collector, this one is great to read. The site is trying to monitor the latest releases and there are a lot of tips for collectors. It also has its own small e-shop where you can buy some hobby boxes.

Universal Pop Report by Gemrate – is an amazing site for getting population reports and statistics of cards. The best thing is the grading stats for major grading companies – with this, you will know how many of the cards were graded by PSA or Beckett. In their blog, you can find the grading recap where you can find monthly statistics.

Universal Pop Report helps with population reports and statistics of cards.

Sports Cards Calendar – This is a great way how to stay updated on upcoming sports cards. On the cardboard connection website, you can find checklists for almost all the sets.

Visual AI Infrastructure for Collectibles by Ximilar

Lastly, I would like to list the solutions we’ve been building for businesses such as collector marketplaces, comparison websites, card dealers, and their mobile applications. We are a SaaS company, focusing on AI, computer vision and visual data, so our tools can be used online via REST API.

Simply said, when it comes to AI for collectibles, we get quite enthusiastic. Currently, we provide:

The most accurate sports card recognition API in the world – we can recognize parallels and holographic variants with great accuracy, and our growing database already contains millions of cards
The most accurate trading card identification API in the world. For instance, we are achieving 99 % accuracy for Magic The Gathering, Pokemon and Yu-Gi-Oh!
An online card grading and card condition system for smartphone cameras and scanners, also available via API
Visual search and similarity systems for collectibles, which enable your customers to search your catalog of collectibles via pictures
Product description systems (described here on fashion use case) – for automatization of submission of products on your marketplace
Image upscaling system – the resolution of your images can be enhanced up to 8x easily
Grading slab reading with OCR, which extracts the content of the slab, such as grade, the certificate number, year, and player and card name
Background removal system – in case you need to exactly cut out the card

Our systems are built to analyze large datasets with speed & accuracy. They’re ready to use right away and customizable for specific image collections.

We are continuously improving the models, extending our sports card database and enhancing the speed of the recognition process. We are improving the parallels/refractors identification of sports cards, and our TCG identifier can manage language variations (US, Japanese, Chinese, Korean, …) and different editions (1st edition of Pokemons, MTGs editions). If you would like to help with an API integration, we are here to help. Just reach out via our chat or contact form.

Try our public demos

The post The Best Online Tools, Apps, and Services for Card Collectors appeared first on Ximilar: Visual AI for Business.

How Fashion Tagging Works and Changes E-Commerce?

Zuzana Raidová — Wed, 22 May 2024 10:05:34 +0000

Keeping up with the constantly emerging trends is essential in the fashion industry. Beyond shifts in cuts, materials, and colours, staying updated on technological trends has become equally, if not more, crucial in recent years. Given our expertise in Fashion AI, let’s take a look at the key technologies reshaping the world of fashion e-commerce, with a particular focus on a key Fashion AI tool: automated fashion tagging.

AI’s Impact on Fashion: Turning the Industry on Its Head

The latest buzz in the fashion e-commerce realm revolves around visual AI. From AI-powered fashion design to AI-generated fashion models, and all the new AI tools, which rapidly change our shopping experience by quietly fueling the product discovery engines in the background, often unnoticed.

Key AI-Powered Technologies in Fashion E-Commerce

So what are the main AI technologies shaking up fashion e-commerce lately? And why is it important to keep up with them?

Recognition, Detection & Data Enrichment in Fashion

In the world of fashion e-commerce, time is money. Machine learning techniques now allow fashion e-shops to upload large unstructured collections of images and extract all the necessary information from them within milliseconds. The results of fashion image recognition (tags/keywords) serve various purposes like product sorting, filtering, searching, and also text generation.

AI can automatically assign relevant tags and save you a significant amount of money and time, compared to the manual process.

These tools are indispensable for today’s fashion shops and marketplaces, particularly those with extensive stock inventories and large volumes of data. In the past few years, automated fashion tagging has made time-consuming manual product tagging practically obsolete.

Try how it works

Learn more

Generative AI Systems for Fashion

The fashion world has embraced generative artificial intelligence almost immediately. Utilizing advanced AI algorithms and deep learning, AI can analyze images to extract visual attributes such as styles, colours, and textures, which are then used to generate visually stunning designs and written content. This offers endless possibilities for creating personalized shopping experiences for consumers.

Different attributes extracted during the product tagging process can directly serve for titles and descriptions. You can set the style and length, or choose important attributes.

Our AI also enables you to automate the writing of all product titles and product descriptions via API, directly utilizing the product attributes extracted with deep tagging and letting you select the tone, length, and other rules to get SEO-friendly texts quickly. We’ll delve deeper into this later on.

Fashion Discovery Engines and Recommendation Systems

Fashion search engines and personalized recommendations are game-changers in online shopping. They are powered by our speciality: visual search. This technology analyzes images in depth to capture their essence and search vast product catalogs for identical or similar products. Three of its endless uses are indispensable for fashion e-commerce: similar items recommendations, reverse image search and image matching.

Personalized experiences and product recommendations are essential for high engagement of customers.

Visual search enables shoppers to effortlessly explore new styles, find matching pieces, and stay updated on trends. It allows you to have your own visual search engine, that rapidly scans image databases with millions of images to provide relevant and accurate search results within milliseconds. This not only saves you time but also ensures that every purchase feels personalized.

Try how it works

Learn more

Shopping Assistants in Fashion E-Commerce and Retail

The AI-driven assistants guide shoppers towards personalized outfit choices suited for any occasion. Augmented Reality (AR) technology allows shoppers to virtually try on garments before making a purchase, ensuring their satisfaction with every selection. Personalized styling advice and virtual try-ons powered by artificial intelligence are among the hottest trends developed for fashion retailers and fashion apps right now.

Both fashion tags for occasions extracted with our automated product tagging, as well as similar item recommendations, are valuable in systems that assist customers in dressing appropriately for specific events.

My Fashion Website Needs AI Automation, What Should I Do?

Consider the Needs of Your Shoppers

To provide the best customer experience possible, always take into account your shoppers’ demographics, geographical location, language preferences, and individual styles.

However, predicting style is not an easy task. But by utilizing AI, you can analyze various factors such as user preferences, personal style, favoured fashion brands, liked items, items in their shopping baskets, and past purchases. Think about how to help them discover items aligned with their preferences and receive only relevant suggestions that inspire rather than overwhelm them.

There are endless ways to improve a fashion e-shop. Always keep in mind not to overwhelm the visitors, and streamline your offer to the most relevant items.

While certain customer preferences can be manually set up by users when logging into an app or visiting an e-commerce site, such as preferred sizes, materials, or price range, others can be predicted. For example, design preferences can be inferred based on similarities with items visitors have browsed, liked, saved, or purchased.

Three Simple Steps to Elevate Your Fashion Website With AI

Whether you run a fashion or accessories e-shop, or a vintage fashion marketplace, using these essential AI-driven features could boost your traffic, improve customer engagement, and get you ahead of the competition.

Automate Product Tagging & Text Generation

The image tagging process is fueled by specialised object detection and image recognition models, ensuring consistent and accurate tagging, without the need for any additional information. Our AI can analyze product images, identify all fashion items, and then categorize and assign relevant tags to each item individually.

In essence, you input an unstructured collection of fashion images and receive structured metadata, which you can immediately use for searching, sorting, filtering, and product discovery on your fashion website.

AI image tagging relies on neural networks and deep learning techniques. We only assign product attributes with a certain level of confidence, highlighted in green in our demo.

The keywords extracted by AI can serve right away to generate captivating product titles and descriptions using a language model. With Ximilar, you can pre-set the tone and length, and even set basic rules for AI-generated texts tailored for your website. This automates the entire product listing process on your website through a single API integration.

Try Fashion Tagging

Streamline and Automate Collection Management With AI

Visual AI is great for inventory management and product gallery assembling. It can recognize and match products irrespective of lighting, format, or resolution. This enables consistent image selection for product listings and galleries.

You can synchronise your entire fashion apparel inventory via API to ensure continual processing by up-to-date visual AI. You can either set the frequency of synchronization (e.g., the first day of each month) or schedule the synchronization run every time you add a new addition to the collection.

A large fashion e-commerce store can have millions of fashion images. AI can sort images in product galleries and references based purely on visual attributes.

For example, you can showcase all clothing items on models in product listings or display all accessories as standalone photos in the shopping cart. Additionally, you can automate tasks like removing duplicates and sorting user-generated visual content, saving a lot of valuable time. Moreover, AI can be used to quickly spot inappropriate and harmful content.

Learn more

Provide Relevant Suggestions & Reverse Image Search

During your collection synchronisation, visual search processes each image and each product in it individually. It precisely analyzes various visual features, such as colours, patterns, edges and other structures. Apart from the inventory curation, this will enable you to:

Have your custom fashion recommendation system. You can provide relevant suggestions from your inventory anywhere across the customer journey from the start page to the kart.
Improve your website or app with a reverse image search tool. Your visitors can search with smartphone photos, product images, pictures from Pinterest, Instagram, screenshots, or even video content.

Looking for a specific dress? Reverse image search can provide relevant results to a search query, independent of the quality or source of the images.

Since fashion detection, image tagging and visual search are the holy trinity of fashion discovery systems, we’ve integrated them into a single service called Fashion Search. Check out my article Everything You Need to Know About Fashion Search to learn more.

Visual search can match images, independent of their origin (e.g., professional images vs. user-generated content), quality and format. We can customize it to fit your collection, even for vintage pieces, or niche fashion brands. For a firsthand experience of how basic fashion visual search operates, check out our free demo.

Search Fashion by Photo

How Does the Automated Fashion Tagging Work?

Let’s take a closer look at the basic AI-driven tool for the fashion industry: automated fashion tagging. Our product tagging is powered by a complex hierarchy of computer vision models, that work together to detect and recognize all fashion products in an image. Then, each product gets one category (e.g., Clothing), one or more subcategories (e.g., Evening dresses or Cocktail dresses), and a varied set of product tags.

To name a few, fashion tags describe the garment’s type, cut, fit, colours, material, or patterns. For shoes, there are features such as heels, toes, materials, and soles. Other categories are for instance jewellery, watches, and accessories.

In the past, assigning relevant tags and texts to each product was a labor-intensive process, slowing down the listing of new inventory on fashion sites. Image tagging solved this issue and eliminated the risk of human error.

The fashion taxonomy encompasses hundreds of product tags for all typical categories of fashion apparel and accessories. Nevertheless, we continually update the system to keep up with emerging trends in the fashion industry. Custom product tags, personal additions, taxonomy mapping, and languages other than the default English are also welcomed and supported. The service is available online – via API.

Get the Full Ximilar Fashion Taxonomy

How Do I Use the Automated Fashion Tagging API?

You can seamlessly integrate automated fashion tagging into basically any website, store, system, or application via REST API. I’d suggest taking these steps first:

First, log into Ximilar App – After you register into Ximilar App, you will get the unique API authentication token that will serve for your private connection. The App has many useful functions, which are summarised here. In the past, I wrote this short overview that could be helpful when navigating the App for the first time.

If you’d like to try creating and training your own additional machine learning models without coding, you can also use Ximilar App to approach our computer vision platform.

Secondly, select your plan – Use the API credit consumption calculator to estimate your credit consumption and optimise your monthly supply. This ensures your credit consumption aligns with the actual traffic on your website or app, maximizing efficiency.

Use Ximilar’s credit consumption calculator to optimise your monthly supply.

And finally, connect to API – The connection process is described step by step in our API documentation. For a quick start, I suggest checking out First Steps, Authentication & Image Data. Automated Fashion Tagging has dedicated documentation as well. However, don’t hesitate to reach out anytime for guidance.

Do You Need Help With the Setup?

Our computer vision specialists are ready to assist you with even the most challenging tasks. We also welcome all suggestions and custom inquiries to ensure our solutions meet your unique needs. And if you require a custom solution, our team of developers is happy to help.

We also offer personalized demos on your data before the deployment, and can even provide dedicated server options or set up offline solutions. Reach out to us via live chat for immediate assistance and our team will guide you through the entire process. Alternatively, you can contact us via our contact page, and we will get back to you promptly.

How do custom projects work?

The post How Fashion Tagging Works and Changes E-Commerce? appeared first on Ximilar: Visual AI for Business.

How to Identify Sports Cards With AI

Michal Lukáč — Mon, 12 Feb 2024 11:47:38 +0000

We have huge news for the collectors and collectibles marketplaces. Today, we are releasing an AI-powered system able to identify sports cards. It was a massive amount of work for our team, and we believe that our sports card identification API can benefit a lot of local shops, small and large businesses, as well as individual developers who aim to build card recognition apps.

Sports Cards Collecting on The Rise

Collecting sports cards, including hockey cards, has been a popular hobby for many people. Especially during my childhood, I collected hockey cards, as a big fan of the sport. Today, card collecting has evolved into an investment, and many new collectors enter the community solely to buy and sell cards on various marketplaces.

Some traditional baseball rookie cards can have significant value, for example, the estimated price of a vintage Mickey Mantle PSA 10 1952 Topps rookie baseball card is $15 million – $30 million.

Our Existing Solutions for Card Collector Sites & Apps

Last year, we already released several services focused on trading cards:

First, we released a Trading Card Game Identifier API. It can identify trading card games (TCGs), such as Pokémon, Magic The Gathering: MTG and Yu-Gi-Oh!, and more. We believe that this system is amongst the fastest, most precise and accurate in the world.
Second, we built a Card Grading and fast Card Conditioning API for both sports and trading card games. This service can instantly evaluate each corner, edges, and surface, and check the centring in a card scan, screenshot or photo in a matter of seconds. Each of these features is graded independently, resulting in an overall grade. The outputs can be both values or conditions-based (eBay or TCGPlayer naming). You can test it here.
We have also been building custom visual search engines for private collections of trading cards and other collectibles. With this feature, people can visit marketplaces or use their apps to upload card images, and effortlessly search for identical or similar items in their database with a click. Visual search is a standard AI-powered function in major price comparators. If a particular game is not on our list, or if you wish to search within your own collection, list, or portfolio of other collectibles (e.g., coins, stamps, or comic books), we can also create it for you – let us know.

We have been gradually establishing a track record of successful projects in the collectibles field. From the feedback of our customers, we hear that our services are much more precise than the competition. So a couple of months ago, we started building a sports card scanning system as well. It allows users to send the scan to the API, and get back precise identification of the card.

Our API is open to all developers, just sign up to Ximilar App, and you can start building your own great product on top of it!

Test it Now in Live Demo

This solution is already available for testing in our public demo. Try it for free now!

The Main Features of Sports Cards

There are several factors determining the value of the card:

Rarity & Scarcity: Cards with limited production runs or those featuring star players are often worth more.
Condition: Like any collectible item, the condition of a sports card is crucial. Cards in mint or near-mint condition are generally worth more than those with wear and tear.
Grade & Grading services: Graded cards (from PSA or Beckett) typically have higher prices in the market.
The fame of the player: Names of legends like Michael Jordan or Shohei Ohtani instantly add value to the trading cards in your collection.
Autographs, memorabilia, and other features, that add to the card’s rarity.

Each card manufacturer must have legal rights and licensing agreements with the sports league, teams, or athletes. Right now, there are several main producers:

Panini – This Italian company is the largest player in the market in terms of licensing agreements and number of releases.
Topps – Topps is an American company with a long history. They are now releasing cards from Baseball, Basketball or MMA.
Upper Deck – Upper Deck is a company with an exclusive license for hockey cards from the NHL.
Futera – Futera focuses mostly on soccer cards.

Example of Upper Deck, Futera, Panini Prizm and Topps Chrome cards.

Dozens of other card manufacturers were acquired by these few players. They add their brands or names as special sets in their releases. For example, the Fleer company was acquired by Upper Deck in 2005 and Donruss was bought by Panini.

Identifying Sports Cards With Artificial Intelligence

When it comes to sports cards, it’s crucial to recognize that the identification challenge is more complex than that of Pokémon or Magic The Gathering cards. While these games present challenges such as identical trading card artworks in multiple sets or different language variants, sports cards pose distinct difficulties in recognition and identification, such as:

Amount of data/cards – The companies add a lot of new cards into their portfolio each year. As of the latest date, the total figure exceeds tens of millions of cards.
Parallels, variations, and colours – The card can have multiple variants with different colours, borders, various foil effects, patterns, or even materials. More can be read in a great article by getcardbase.com. Look at the following example of the NBA’s LeBron James card, and some of its variants.

LeBron James 2021 Donruss Optic #41 card in several variations of different parallels and colors.

Special cards: Short Print (SP) and Super Short Print (SSP) cards are intentionally produced in smaller quantities than the rest of the particular set. The most common special cards are Rookie cards (RC) that feature a player in their rookie season and that is why they hold sentimental and historical value.
Serial numbered cards: A type of trading cards that have a unique serial number printed directly on the card itself.
Authentic signature/autograph: These are usually official signature cards, signed by players. To examine the authenticity of the signature, and thus ensure the card’s value, reputable trading card companies may employ card authentication processes.
Memorabilia: In the context of trading cards, memorabilia cards are special cards that feature a piece of an athlete’s equipment, such as a patch from a uniform, shoe, or bat. Sports memorabilia are typically more valuable because of their rarity. These cards are also called relic cards.

As you can see, it’s not easy to identify the card and its price and to keep track of all its different variants.

Example: Panini Prizm Football Cards

Take for example the 2022 Panini Prizm Football Cards and the parallel cards. Gold Prizms (10 cards) are worth much more than the Orange Prizms (with 250 cards) because of their scarcity. Upon the release of a card set, the accompanying checklist, presented as a population table, is typically made available. This provides detailed information about the count for each variation.

2022 Panini Prizm Football Cards examples. (Source: beckett.com)

Next, for Panini Prizm, there are more than 20 parallel foil patterns like Speckle, Hyper, Diamond, Fast Break/Disco/No Huddle, Flash, Mozaic, Mojo, Pulsar, Shimmer, etc. with all possible combinations of colours such as green, blue, pink, purple, gold, and so on.

These combinations matter because some of them are more rare than others. There are also different names for the foil cards between companies. Topps has chrome Speckle patterns which are almost identical to the Panini Prizm Sparkle pattern.

Lastly, no database contains each picture for every card in the world. This makes visual search extremely hard for cards that have no picture on the internet.

If you feel lost in all the variations and parallels cards, you are not alone.

Luckily, we developed (and are actively improving) an AI service that is trying to tackle the mentioned problems with sports cards identification. This service is available on click as an open REST API, so anyone can connect to develop and integrate their system with ours. The results are in seconds and it’s one of the fastest services available in the market.

How to Identify Sports Cards Via API?

In general, you can use and connect to the REST API with any programming language like Python or Javascript. Our developer’s documentation will serve you as a guide with many helpful instructions and tips.

To access our API, sign in Ximilar App to get your unique API authentication token. You will find the administration of your services under Collectibles Recognition. Here is an example REST Request via curl:

$ curl https://api.ximilar.com/collectibles/v2/sport_id -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
    "records": [
        { "_url": "__PATH_TO_IMAGE_URL__"}
    ], "slab_id": false
}'

The example response when you identify sports cards with Ximilar API.

The API response will be as follows:

When the system succesfuly indetifies the card, it will return you full identification. You will get a list of features such as the name of the player/person, the name of the set, card number, company, team and features like foil, autograph, colour and more. It is also able to generate URL links for eBay searches so you can check the card values or purchase them directly.
If we are not sure about the identification (or we don’t have a specific card in our system) the system will return empty search results. In such case, feel free to ask for support.

How AI Sports Cards Identification Works?

Our identification system uses advanced machine learning models with smart algorithms for post-processing. The system is a complex flow of models that incorporates visual search. We trained the system on a large amount of data, curated by our own annotation team.

First, we identify the location of the card in your photo. Second, we do multiple AI analyses of the card to identify whether it has autograph and more. The third step is to find the card in our collection with visual search (reverse image search). Lastly, we use AI to rerank the results to make them as precise as possible.

What Sports Cards Can Ximilar Identify?

Our sports cards database contains a few million cards. Of course, this is just a small subset of all collectible cards that were produced. Right now we focus on 6 main domains: Baseball cards, Football cards, Basketball cards, Hockey cards, Soccer and MMA, and the list expands based on demand. We continually add more data and improve the system.

We try to track and include new releases every month. If you see that we are missing some cards and you have the collection, let us know. We can agree on adding them to training data and giving you a discount on API requests. Since we want to build the most accurate system for card identification in the world, we are always looking for ways to gather more cards and improve the software’s accuracy.

Who Will Benefit From AI-Powered Sports Cards Identifier?

Access to our REST API can improve your position in the market especially if:

You own e-commerce sites/marketplaces that buy & sell cards – If you have your own shop, site or market for people who collect cards, this solution can boost your traffic and sales.
You are planning to design and publish your own collector app and need an all-in-one API for the recognition and grading of cards.
You want to manage, organize and add data to your own card collection.

Is My Data Safe?

Yes. First of all, we don’t save the analysed images. We don’t even have so much storage capacity to store each analysed image, photo, scan and screen you add to your collection. Once our system processes an image, it removes it from the memory. Also, GDPR applies to all photos that enter our system. Read more in our FAQs.

How Fast is the System, Can I Connect it to a Scanner?

The system can identify one card scan in one second. You can connect it to any card scanner available in the market. The scanning outputs the cards into the folders, to which you can apply a script for card identification.

Sports Cards Recognition Apps You Can Build With Our API

Here are a few ideas for apps that you can build with our Sport Card Identifier and REST API:

Automatic card scanning system – create a simple script that will be connected to our API and your scanners like Fujitsu fi-8170. The system will be able to document your cards with incredible speed. Several of our customers are already organizing their collections of TCGs (like Magic The Gathering or Pokémon) and adding new cards on the go.
Price checking app or portfolio analysis – create your phone app alternative to Ludex or CollX. Start documenting the cards by taking pictures and grading your trading card collection. Our system can provide card IDs, pre-grade cards, and search them in an online marketplace. Easily connect with other collectors, purchase & sell the cards. Test our system’s ability to provide URLs to marketplaces here.
Analysing eBay submission – would you like to know what your card’s worth and how many are currently available in the market? For how much was the card sold in the past? Track the price of the card over time? Or what is the card population? With our technology, you can build a system that can analyse it.

AI for Trading Cards and Collectors

So this is our latest narrow AI service for the collector community. It is quite easy to integrate it into any system. You can use it for automatic documentation of your collection or simply to list your cards on online markets.

For more information, contact us via chat or contact page, and we can schedule a call with you and talk about the technical and business details. If you want to go straight and implement it, take look at our developer’s API documentation and don’t hesitate to ask for guidance anytime.

Right now we are also working on Comics identification (Comic book, magazines and manga). If you would like to hear more then just contact us via email or chat.

Try our public demos

The post How to Identify Sports Cards With AI appeared first on Ximilar: Visual AI for Business.

The Best Tools for Machine Learning Model Serving

Michal Lukáč — Wed, 25 Oct 2023 09:26:42 +0000

As the prevalence of AI in various industries increases, so does the need to optimize the machine learning model serving. As a machine learning engineer, I’ve seen that training models is just one part of the ML journey. Equally important as the other challenges is the careful selection of deployment strategies and serving systems.

In this article, we’ll delve into the importance of selecting the right tools for machine learning model serving, and talk about their pros and cons. We’ll explore various deployment options, serving systems like TensorFlow Serving, TorchServe, Triton, Ray Serve, and MLflow, and also the deployment of specific models such as large language models (LLMs). I’ll also provide some thoughts and recommendations for navigating this ever-evolving landscape.

Machine Learning Models Serving Then and Now

When I first began my journey in the world of machine learning, the landscape was constantly shifting. The frameworks being actively developed and used at the time included Caffee, Theano, TensorFlow (Google) and PyTorch (Meta), all vying for their place in the world of AI. As time has passed, the competition has become more and more lopsided, with TensorFlow and PyTorch leading the way. While TensorFlow has remained the more popular choice for production-ready models, PyTorch has been steadily gaining in popularity, particularly within research circles, for its faster, more intuitive prototyping capabilities.

While there are hundreds of libraries available to train and optimize models, the most popular frameworks such as TensorFlow, PyTorch and Scikit-Learn are all based on Python programming language. Python is often chosen due to its simplicity and the vast amount of libraries for data manipulation. However, it is not the fastest language and can present problems with parallel processing, threads and GIL. Additionally, specialized libraries such as spaCy and PyG are available for specific tasks, such as Natural Language Processing (NLP) and Graph Analysis, respectively. The focus was and still partially is on the optimization of models and architectures. On the other hand, there are more and more problems in machine learning models serving in production because of the large-scale adoption of AI.

Nowadays, even more complex models like large language models (LLM, GPT/LAMMA/BARD) and multi-modal models are in fashion which creates a bigger pressure on optimal model deployment, infrastructure environment and storage capacity. Making machine learning model serving and deployment effective and cheap is a big problem. Even companies like Microsoft or NVIDIA are actively working on solutions that will cut the costs of it. So let’s look into some of the best options that we as developers currently have.

The Machine Learning and DevOps Challenges

Being a Machine Learning Engineer, I can say that training a model is just a small part of the whole lifecycle. Data preparation, deployment process and running the model smoothly for numerous customers is a daily challenge and a major part of the job.

Deployment Strategies

In addition to having to allocate GPU/CPU resources and manage inference speed, the company deploying ML models must also consider the deployment strategy for the trained model. You could be deploying the ML model as an API, running it in a container, or using a serverless platform. Each of these options comes with its own set of benefits and drawbacks, so carefully considering the best approach is essential. When we have a trained model, there are several options on how to use it:

Deploy it as an API endpoint, sending data in the request and getting results immediately in response. This approach is suitable for faster models that are able to process the data in just a few seconds.
Deploy it as an API endpoint, but return just a promise or asynchronous response from the model. This is great for computational-intensive models that can take minutes or hours of processing. For example, generative models and upscaling models are slow and require this approach.
Use a system that is able to serve it for you.
Use the model locally on your data.
Deploy models on Smartphones or IoT devices with feed from local sensors.

Other Challenges

The complexity of machine learning projects grows with variables such as:

The number of models – It is common practice to use multiple models. For example, at this moment, there are tens of thousands of different ML models on the Ximilar platform.
Model versions – You can train each of your models on different training data (part of the dataset) and mark it as a different version. Model versioning is great if you want to A/B test your ML model, tune your model performance, and for continuous model training.
Format of models – You can potentially train and save your ML models in various formats. For instance, .h5 which is a Keras/TensorFlow format or .pt (PyTorch) or .onnx for ONNX Runtime. Usually, each framework supports only specific formats.
The number of frameworks – Served ML models could be trained with different frameworks and their versions.
The number of the nodes (servers) – Models can be hosted on one or multiple servers and the serving system should be able to intelligently load balance the requests on servers so that none of them is throttled.
Models storage/registry – You need to store the ML models in some database or storage, such as AWS S3 or local storage
Speed/performance – The loading time of models from the storage can be critical and can cause a slow inference per sample.
Easy to use – Calling model via Rest API or gRPC requests, single-or-batch inference.
Hardware specification – ML models can be deployed on Edge devices or PCs with various architectures.
GPUs vs CPUs and libraries – Some models must be used only on CPUs and some require a GPU card.

Our Approach to the Machine Learning Model Serving

Several systems were developed to tackle these problems. Serving and deploying machine learning models has come a long way since we founded Ximilar in 2016. Back then, no system was capable of effectively serving hundreds of neural networks for inference.

So, we decided to build our own system for machine learning model serving, and today it forms the backbone of our machine-learning platform. As the use of AI becomes more widespread in companies, newer systems such as TensorFlow Serving emerge quickly to meet the increasing demand.

Which Framework Is The Best?

The Battle of Machine Learning Frameworks

Nowadays, each big tech company has their own solution for machine learning model serving and training. To name a few, PyTorch (TorchServe) and AITemplate by META (Facebook), TensorFlow (TFServing) by Google, ONNX runtime by Microsoft, Triton by NVIDIA, Multi-Model-Server by Amazon and many others like BentoML or Ray.

There are also tens of formats that you can save your ML model in, just TensorFlow alone is able to save into .h5, .pb, saved_model or .tflite formats, each of them serving a different purpose. For example, TensorFlow Lite is great for smartphones. It also loads very fast, so the availability of the model is great. However, it supports only limited operations and more modern architectures cannot be converted with it.

Machine learning model serving: each big tech company has their own solution for training and serving machine learning models.

You can also try to convert models from PyTorch or TensorFlow to TensorRT and OpenVino formats. The conversion usually works with basic and most-used architectures. The TensorRT is great if you are deploying ML models on Jetson Nano or Xavier. You can achieve a boost in performance on Intel servers via OpenVino conversion or the Neural Magic library.

The ONNX Format

One notable thing is the ONNX format. The ONNX is not a library for training your machine learning models, ONNX is an open format for storing machine learning models. After the model training, for example, in TensorFlow, you can convert it to ONNX format. You are able to run this converted model via ONNX runtime on almost any platform, programming language, CPU architecture and with preferred hardware acceleration. Sometimes serving a model requires a specific version of libraries, which is why you can solve a lot of problems via ONNX.

Exploration is Key

There are a lot of options for ML model training, saving, conversion and deployment. Every library has its pros and cons, some of them are easy to use for training and development. Others, on the other hand, are specialized for specific platforms or for specific fields (computer vision, recommender systems or NLP).

I would recommend you invest some time in exploring all the frameworks and systems, before deciding which framework you would like to lock in. The competition is rough in this field and every company tries to be as innovative as possible to keep up with the others. Even a Chinese company Baidu developed their own solution called PaddlePaddle. At the end of the article, I will give some recommendations on which frameworks and serving systems you should use and when.

The Best Machine Learning Serving Tools

OK, let’s say that you trained your own model or downloaded one that has already been trained. Now you would like to deploy a machine-learning model in production. Here are a few options that you can try.

If you don’t know how to train a machine learning model, you can start with this tutorial by PyTorch.

Deploy ML Models With API

If you have one or a few models, you can build your own system for ML model serving. With Python and libraries such as Flask or Django, there is a straightforward way to develop a simple REST API. When the web service starts, it loads the model in the background and then every incoming request will call the model on the incoming data.

It could get problematic if you want to effectively work with GPU cards, and handle parallel requests. I would recommend packing the system to Docker and then running it in Kubernetes.

With Kubernetes, Docker and smart load-balancing as HAProxy such a system can potentially scale to bigger volumes. Java or Go languages are also good languages to deploy ML models.

Here is a simple tutorial with a sci-kit-learn model as REST API with Flask.

Now let’s have a look at the open-source serving systems that you can use out of the box, usually with a small piece of code or no code at all.

TensorFlow Serving

GitHub | Docs

TensorFlow Serving is a modern serving system for TensorFlow ML models. It’s a part of TensorFlow Extended developed by Google. The recommended way of using the system is via Docker.

Simply run the Docker pull TensorFlow/serving (optionally TensorFlow/serving:latest-gpu if you need GPU support) command. Just run the image via Docker:

docker run -p 8501:8501 
  --mount type=bind,source=/path/to/my_model/,target=/models/my_model 
  -e MODEL_NAME=my_model -t tensorflow/serving

Now that the system is serving your model, you can query with gRPC or REST calls. For more information, read the documentation. TensorFlow Serving works best with the SavedModel format. The model should define its signature_def_map which will define the inputs and outputs of the model. If you would like to dive into the system then my recommendation is videos by the team itself.

In my opinion, TensorFlow serving is great with simple models and just a few versions. The documentation, however, could be simpler. With advanced architectures, you will need to define the custom operations, which is a big disadvantage if you have a lot of models with more modern operations.

TorchServe

GitHub | Docs

TorchServe is a more modern system than TensorFlow Serving. The documentation is clean and supports basically everything that TF Serving does, however, this one is for PyTorch models. Before serving a PyTorch model via TorchServe, you need to convert them to .mar packages. Basically, the .mar package tells the model name, version, architecture and actual weights of the model. Installation and running are also possible via Docker, and it is very similar to TensorFlow Serving.

I personally like the management of the models, you are able to simply register new models by sending API requests, list models and query statistics. I find the TorchServe very simple to use. Both REST API and gRPC are available. If you are working with pure PyTorch models then the TorchServe is recommended way.

Triton

GitHub | Docs

Both of the serving systems mentioned above are tightly bound to the frameworks of the models they are able to serve. That is probably why Triton has a big advantage over them since it can serve both TensorFlow and PyTorch models. It is also able to serve OpenVINO, ONNX and TensorRT formats! That means it supports all the major formats in the machine learning field. Even though NVIDIA developed it, it doesn’t require a GPU card and can run also on CPUs.

To run Triton, simply pull it from the docker repository via the Docker pull nvcr.io/nvidia/tritonserver command. The triton servers are able to load models from a specific directory called model_repository. Each model is defined with configuration, in this configuration, there is a platform setting that defines a model format. For example, “tensorflow_graphdef” or “onnxruntime_onnx“. In this way, Triton knows how to run specific models.

The documentation is not super-easy to read (mostly GitHub README files) because it is in very active development. Otherwise, working with the models is similar to other serving systems, meaning calling models via gRPC or REST.

Ray Serve

GitHub | Docs

Ray is a general-purpose system for scaling machine learning workloads. It primarily focuses on model serving and providing the primitives for you to build your own ML platform on top.

Ray Serve offers a more Pythonic way of creating your own serving system. It is framework-agnostic and anything that can be run via Python can be run also with Ray. Basically, it looks as simple as Flask. You define the simple Python class for your model and decorate it with a route prefix handler. Then you just call the REST API request.

import requests
from starlette.requests import Request
from typing import Dict

from ray import serve

# 1: Define a Ray Serve deployment.
@serve.deployment(route_prefix="/")
class MyModelDeployment:
    def __init__(self, msg: str):
        # Initialize model state: could be very large neural net weights.
        self._msg = msg

    def __call__(self, request: Request) -> Dict:
        return {"result": self._msg}

# 2: Deploy the model.
serve.run(MyModelDeployment.bind(msg="Hello world!"))

# 3: Query the deployment and print the result.
print(requests.get("http://localhost:8000/").json())

If you want to have more control over the system, Ray is a great option. There is a Ray Clusters library which is able to deploy the system on your own Kubernetes Cluster, AWS or GCP with the ability to configure the autoscaling option.

MLflow

MLflow is an open-source platform for the whole ML lifecycle. From training to evaluation, deployment, tracking, model monitoring and central model registry.

MLflow offers a robust API and several language bindings for the whole management of the machine learning model’s lifecycle. There is also a UI for tracking your trained models. MLflow is really a mature package with a whole bundle of components that your team can use.

Other Useful Tools for Machine Learning Model Serving

Multi-Model-Server is a similar system to the previous ones. Developed by the Amazon AWS team, the system is able to run models trained with MXNet or converted via ONNX.
BentoML is a project very similar to MLflow. There are many different tools that data scientists can use for training and deployment processes. The UI looks a bit more modern. BentoML is also able to automatically generate Docker images for your models.
KServe is a simple system for managing and scaling models on your Kubernetes. It solves the deployment, and autoscaling and provides standardized inference protocol across ML frameworks.

Cloud Options of AWS, GCP and Azure

Of course, every big tech player provides cloud platforms to host and serve your machine learning models. Let’s have a quick look at a few examples.

Microsoft is a big supporter of ONNX, so with Azure Machine Learning services, you are able to deploy your models to the cloud via Python or Azure CLI. The process requires an entry script in Python with two methods: init for initialization of your model and run for inference. You can find the entire workflow in Azure development documentation.

The Google Cloud Platform (GCP) has good support for TensorFlow as it is their native framework. However, Docker deployment is available, so other frameworks can be used too. There are multiple ways to achieve the deployment. The classic way will be using the AI Platform prediction tool or Google Cloud Run. There is also a serverless HTTP endpoint/function, which serves your model stored in the Google Cloud Storage bucket. You define your function in Python with the prediction method and loading of the model.

Amazon Web Services (AWS) also contains multiple options for the ML deployment process and serving. The specialized system for machine learning is Amazon Sagemaker.

All the big platforms allow you to create your own virtual server instances. Create your Kubernetes clusters and use any of the systems/frameworks mentioned earlier. Nevertheless, you need to be very careful because it could get really pricey. There are also smaller players on the market such as Banana, Seldon and Comet ML for training, serving & deployment. I personally don’t have experience with them but they are becoming more popular.

Large Language (LLMs) and Multi-Modal Models in Production

With the introduction of GPT by OpenAI a new class of AI models was introduced – the large language models (LLMs). These models are extremely big, trained on massive datasets and deployed on an infrastructure that requires a whole datacenter to run. “Smaller” – usually open source version – models are released but they also require a lot of computational resources and modern servers to run smoothly.

Recently, several serving systems for these models were developed:

OpenLLM by BentoML is a nice system that supports almost all open-source models like Llama2. You can just pick one of the models and run the following commands to start with the serving and query the results:

openllm start opt
export OPENLLM_ENDPOINT=http://localhost:3000
openllm query 'Explain to me the difference between "further" and "farther"'

vLLM project is a Python library that can help you with the deployment of LLM as an API Server. What is great is that it supports OpenAI-Compatible Server, so you can switch from OpenAI paid service easily to open source variant without modifying the code on the client. This project is being developed at UC Berkeley and it is integrating new techniques for fast inferencing of LLMs.
SkyPilot – is a great option if you want to run the LLMs on cloud providers such as AWS, Google Cloud or Azure. Because running these models is costly, SkyPilot is able to pick the cheapest provider automatically and launch it as an endpoint.

Ximilar AI Platform

Free Login | Docs

Last but not least, you can use our codeless machine-learning platform. Instead of writing a lot of code, training and deploying an ML model by yourself, you can try it in the Ximilar App. Training image classification and object detection can be done both in the browser App or via API. There is every tool that you would need in the ML model development stage, such as training data/image management, labelling tools, evaluation of your models on testing and training datasets, performance metrics, explanation of models on specific images, and so on.

Ximilar’s computer vision platform enables you to develop AI-powered systems for image recognition, visual quality control, and more without knowledge of coding or machine learning. You can combine them as you wish and upgrade any of them anytime.

Once your model is trained, it is deployed as a REST API endpoint. It can be connected to a workflow of more machine learning models working together with conditions like if-else statements. The major benefit is you just connect your system to the API and query the results. All the training and serving problems are solved by us. In the end, you will save a lot of costs because you don’t need to own or rent your infrastructure, serving systems or specialized software engineering team on machine learning.

We built a Ximilar Platform so that businesses from e-commerce, healthcare, manufacturing, real estate and other areas could simply develop their own AI models without coding and with a reasonable budget. For example, on the following screen, you can see our task management for the trading cards collector community.

We and our customers use our platform for the training of machine learning models. Together with our own system for machine learning model serving is it an all-in-one solution for ML model deployment.

The great thing is that everything is manageable via REST API requests with JSON responses. Here is a simple curl command to query all models in production:

curl --request GET 
  --url https://api.ximilar.com/recognition/v2/task/ 
  --header 'Content-Type: application/json' 
  --header 'authorization: Token APITOKEN'

Deployment of ML Models is Science

There are a lot of systems that try to make deployment and serving easy. The topic of deployment & serving is broad, with many choices for hardware infrastructure, DevOps, programming languages, system development, costs, storage, and scaling. So it is not easy to pick one. If you would like to dig deeper, I would suggest the following content for further reading:

For the performance test of serving systems, I recommend a post from Biano that includes testing scripts.
A nice overview of all the deployment systems is also in a video lecture on the Full Stack Deep Learning course.

My Final Tips & Recommendations

Pick a good framework to start with

Doing machine learning for more than 10 years, my advice is to start by picking a good framework for model development. In my opinion, the best choice right now is PyTorch. Using it is easy and it supports a lot of state-of-the-art architectures.

I used to be a fan of TensorFlow for a long time, but over time, its developers were not able to integrate modern approaches. Also, the backward compatibility is often disrupted and the quality of code is getting worse which leads to more and more bugs in the framework.

Save your models in different formats

Second, save your models in different formats. I would also recommend using ONNX and OpenVino here. You never know when you will need it. This happened to me a few times. We needed to upgrade the server and systems (our production environment), but the new versions of libraries stopped supporting the specific format of the model, so we had to switch to a different one.

Pick a serving system suitable to your needs

If you are a small company, then Ray Serve is a good option. Bigger companies, on the other hand, have complex requirements for development and robust infrastructure. In this case, I would recommend picking more complex systems like MLFlow. If you would like to serve the models on the cloud, then look at a multi-model server. The choice is really based on the use case. If you don’t want to bother with all of this then try our Ximilar Platform, which is a solution model optimization, model validation, data storage and model deployment as API.

I will keep this article updated and if there is some new perspective serving system I will be more than happy to mention it here. After all, machine learning is about constant progress, and that is one of the things I like about it the most.

The post The Best Tools for Machine Learning Model Serving appeared first on Ximilar: Visual AI for Business.

AI Card Grading – Automate Sports Cards Pre-Grading

Michal Lukáč — Tue, 12 Sep 2023 11:20:08 +0000

In my last blog post, I wrote about our new artificial intelligence services for trading card identification. We created new API endpoints for both sports card recognition and slab reading, and similar solutions for trading card games (TCGs). Such solutions are great for analyzing and cataloguing a large card collection. I also briefly described our card grading endpoint, which was still in development at that time.

Today we are releasing three public API endpoints for evaluating card grade, centering and card condition with AI:

Card Grading – the most complex endpoint that evaluates corners, edges, surface and centering
Card Centering – computing just the centering of the card
Card Condition – simple API for getting condition of the card for marketplace (ebay) submission

In this blog post, I would like to get more in-depth about the AI card grading solution. How we built it, what are the pros and cons, how it is different from PSA grading or Beckett grading services, and how you can use it via REST API for your website or app.

AI Card Grading Services as API

With the latest advances in artificial intelligence, it is becoming increasingly common in our daily lives, and collectible cards are a field that doesn’t get left behind. A lot of startups are developing their own card grading, identification, scanning and documenting systems. Some of them were already successfully sold to big players like eBay or PSA. Just to mention a few:

CollX raises $5.5M to scan and evaluate the value of trading cards
eBay acquires several startups from the collectors industry like TCGplayer and 3PM Shield LLC
Collectible card trading platform Alt raises $75M
PSA Acquires Genamint to introduce Next-Generation Grading process
AI trading card startup Ludex raises $8M

To understand why card grading is so popular, let’s look at the standard grading process and how the industry works.

Standard Grading Process

Card grading has gained widespread popularity in the world of collectibles by offering a trusted way to assess trading cards to collectors. It’s a method that gives a fair and unbiased evaluation of a card’s condition, ensuring its authenticity and value. This appeals to both seasoned collectors who want to preserve their cards’ worth and newcomers looking to navigate the collectible market confidently.

The process involves sending cards to experts who carefully inspect them for qualities like centering, corners, edges, and surface. The standard grading process for trading cards involves these key steps:

Submission: Collectors send their cards to grading companies.
Authentication: Cards are checked for authenticity.
Grading: Cards are assessed for condition and assigned a grade from 1 to 10 on a grading scale by an expert.
Encapsulation: Graded cards are sealed in protective holders.
Labelling & Certification: Labels with card details and grades are added. Cards’ information is recorded for verification. Special labels (such as fugitive ink, QR codes, or serial numbers) are introduced to prevent tampering.
Return/Sale: Graded cards are returned to owners or sold for higher value.

Costs of Grading Services

The price for submitting cards and their grading depends on the company and the card. For example, the minimal grading price per card by PSA (Professional Sports Authenticator) is 15 USD, and it’s much more for more expensive cards.

You can pay hundreds of dollars if you have some rare baseball card from Topps or non-sports cards from Magic The Gathering or Yu-Gi-Oh! If your modern card collection contains hundreds of cards, the pricing can reach astronomical values. Of course, grading often makes the card’s value higher, depending on its condition and grade.

A typical collectible TCG card after the grading process. Some Pokémon cards can cost thousands of dollars, and the value is even higher after grading.

Pros And Cons of Classic Grading

Besides its costliness, classic grading has several other drawbacks:

It is a time-consuming offline process that is not particularly ideal for large-scale grading of whole collections.
Some grading companies would only grade cards with minimum submission value (declared value that is used for insurance).
Also, customers can usually submit only cards from popular series such as Pokémon, Magic The Gathering, Yu-Gi-Oh!, Sport Topps cards, and Sport Panini cards.

Of course, there are also advantages – like a physically sealed slab with a graded card, confirming its authenticity, and grading done by experts who can look at a card from all different angles and not just from a single image.

Nevertheless, there are a lot of steps involved in card grading, and the entire process takes a lot of time and effort. AI grading can help with the entire workflow, from authentication to grading and labelling.

Computer vision can easily and consistently spot printing defects, analyze corners and edges individually and compute centering in a matter of seconds and for a fraction of the price.

Introducing Online AI Card Grading REST API Service

Fast & Affordable AI Card Grading

Our intention is by no means to replace expert grading companies like PSA, BGS, SGC or CGC with AI-powered card grading. We would rather like it to be a faster, more consistent & cheaper alternative for anyone who needs bulk pre-grading of their collections.

One use case for our AI grading service is to use it to automate the estimation of the declared value of the card. A declared value is the estimated value of the collectible card after PSA has graded it (read PSA’s explanation here).

First, you will submit your card for grading by just sending the photo to our API. After obtaining a grade from our service, you can use our visual search system or card ID for a price guide. Actually, you will not only get the final grade of the card but a detailed grading breakdown (for edges, corners, centering, and surface). Then you can decide by yourself if you want to spend more money for physical grading or to sell it on eBay.

How Do We Train AI to Grade Cards?

To build an AI grading system powered by computer vision and machine learning techniques, we needed a lot of data that imitated real-world use cases (usually user-generated content such as smartphone pictures).

We manually destroyed some of our cards and intentionally used their tilted photos. We needed images imitating real-life pictures for annotation and training of machine learning models creating the AI card grading solution.

We spent a lot of time building our own dataset, including damaging our own cards. Our purpose from the beginning was to have a grader that would work both on sports cards and trading card games (TCGs), as well as images of different qualities and with different positioning of the cards.

AI Card Grader Consists of Several AI Models

Our card grading solution integrates a number of machine learning models trained on specific datasets. After you upload a photo of a card, the system needs to be able to correctly detect its position. It then identifies the type of the card: a sports card or a trading card game. Another recognition model identifies whether the picture shows the front or back of the card.

After localization & simple identification, the card gets an individual evaluation of its parts. We trained numerous models for individual grading of corners, edges, card surface, and centering, in accordance with grading standards such as PSA or Beckett.

Of course, different types of cards require a different approach, which is why, for example, we have two different models for corners. While sports cards should have sharp corners, TCG cards are typically more rounded.

From the individual grades, we compute a final grade with condition evaluation. Another model is identifying autographed cards. The cards with autographs are generally more valuable.

AI card grading of individual parts of the back of a sports card.

The big advantage is that the output of the card grading is easy to visualize. That is why we also provide a simple image with the report for each graded card. There you can see a detailed grading breakdown for every part of the card.

Limitations of AI and Machine Learning in Card Grading

Of course, both humans and AI can make mistakes. There are some limitations of the system. Estimating card grades from the images requires relatively high-resolution images, with good lighting conditions and with low post-processing.

As a matter of fact, a lot of modern cameras in smartphones are currently not very good at close-up photos. Their sensors have gotten bigger over the years, and their AI is upscaling the photos. This makes them artificially sharp with cartoon-like effects. This can of course corrupt the overall results. However, as I previously mentioned, that is why we train the models on real-life images and gradually improve their performance.

Let’s Get Some Cards Graded Via Our Online API

Modern Basketball Card

We can test our AI grader via Ximilar App. For this purpose, I chose one of the classic basketball cards of Michael Jordan. BGS (Beckett) gave this card a grade of 6 (EX-MT).

Our online grading system assigned this card a final grade of 6.5. The centering is quite off, so the system graded it 6/10. The grading is still not perfect, as it misses the surface by quite a large margin. However, the final grade is quite close to the one received by Beckett.

AI card grading and grade breakdown by Ximilar demonstrated on a classic basketball card with Michael Jordan.

In the breakdown image, you can see how the system evaluated individual parts of the card. The lines are drawn on the image, so you can see the details of individual grades for corners and edges. We hope that this brings more transparency to the algorithmic grading.

Vintage Baseball Card

Now let’s take a look at an image of a vintage sports card without an autograph. As an example, I chose the baseball card with Ed Mathews.

The final grade that the card receives is 6.0. The average corner value assigned by the system is 4.0 and edges are 7.0. The grade for the surface is 5.5 and the centering is 7.0 (left/right is 36/65 and top/bottom is 38/62).

AI card grading and its visualization by Ximilar with localization and centering.

We can take a look at the corners and think whether a professional grader would assign the same values. I personally think that the grade is reasonable. However, getting grades from a single image is hard. We’re also not trying to make the values precise up to decimals (e.g., 4.12453 for the upper left corner). We want this to be an affordable soft pre-grading solution.

Card corners are one of the reasons why pictures used for AI card grading should have as high resolution as possible.

Card corners are a bit blurry, so ideally, we would like to have a sharper image. However, we can see that the corners are not in the range of 7–10 grades but rather lower (4-6).

How Do We Compute the Final Grade?

We compute the final grade for corners and edges simply as an average of the individual values. We trained the centering grader according to the Beckett grading scale. It is in our opinion much better (has higher demands) than PSA in this case. So to get 10 points for centering, you need to have a 50/50 ratio – on top/bottom and left/right.

The good thing is, that since we provide values for all parts of the card, you don’t need to use our final grades. You can actually create and use your own formula for computing the final grade.

Card Centering API with AI

Some of our customers would like to compute just the centering of the card. That is why we publish also endpoint for this. It will return you offsets from left, right and top and bottom borders of the card. The offsets are relative and also absolute so you can visualize it in your application. Each API response contains image with visualized centering as part of the output:

Computed centering of the Pokemon card.

Lightweight Grading, alias Card Condition Assessment

For customers that want to submit cards to online marketplaces and need to know just the condition of the card like Near Mint, Lightly Played, Heavily Played or Damaged we offer an additional endpoint for getting rough condition of your card. Because this endpoint (/v2/condition) is much simpler and also significantly cheaper than our /v2/grade endpoint. It’s great for a massive amount of data and suitable for collector shops all over the world. The API endpoint can be called from your application or we can write your own script that is able to analyze images/cards from Fujitsu scanners (Fujitsu FI-8170). If you also want to have a card identification service, our visual search AI can identify the TCGs like Pokemon, Magic The Gathering or Yugioh! with more than 98% accuracy.

You can ask to return the condition in several different formats like TCGPlayer, Ebay or our own.

Identification of card condition via Ximilar REST API endpoint with AI.

The more about /v2/condition endpoint can be found in our documentation.

How You Can Test Ximilar Card Grader?

To test our online card grader API, you will need to log into the Ximilar App, where it is currently available to users of all plans for testing purposes. We are also currently working on a public demo.

The system is not perfect, neither is the real human grader. It will take us some time to develop something that will be near perfect and very stable. But I believe that we are on the right track to make AI-powered solutions in the collectibles industry more accessible and cheaper.

To Sum Up

The AI card grader is just one of many solutions by Ximilar that the collector community can use. Make sure to check out our AI Recognition of Collectibles. It is a universal service for the automated detection and recognition of all kinds of collectible items.

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

If you would like us to customize any solution for collectors, just contact us and we will get back to you. We created these solutions (Card Identification and Card Grading) to be the best publicly available AI tools for collectors.

The post AI Card Grading – Automate Sports Cards Pre-Grading appeared first on Ximilar: Visual AI for Business.

When OCR Meets ChatGPT AI in One API

Michal Lukáč — Wed, 14 Jun 2023 09:38:27 +0000

Imagine a world where machines not only have the ability to read text but also comprehend its meaning, just as effortlessly as we humans do. Over the past two years, we have witnessed extraordinary advancements in these areas, driven by two remarkable technologies: optical character recognition (OCR) and ChatGPT (generative pre-trained transformer). The combined potential of these technologies is enormous and offers assistance in numerous fields.

That is why we in Ximilar have recently developed an OCR system, integrated it with ChatGPT and made it available via API. It is one of the first publicly available services combining OCR software and the GPT model, supporting several alphabets and languages. In this article, I will provide an overview of what OCR and ChatGPT are, how they work, and – more importantly – how anyone can benefit from their combination.

What is Optical Character Recognition (OCR)?

OCR (Optical Character Recognition) is a technology that can quickly scan documents or images and extract text data from them. OCR engines are powered by artificial intelligence & machine learning. They use object detection, pattern recognition and feature extraction.

An OCR software can actually read not only printed but also handwritten text in an image or a document and provide you with extracted text information in a file format of your choosing.

How Optical Character Recognition Works?

When an OCR engine is provided with an image, it first detects the position of the text. Then, it uses AI model for reading individual characters to find out what the text in the scanned document says (text recognition).

This way, OCR tools can provide accurate information from virtually any kind of image file or document type. To name a few examples: PDF files containing camera images, scanned documents (e.g., legal documents), old printed documents such as historical newspapers, or even license plates.

A few examples of OCR: transcribing books to electronic form, reading invoices, passports, IDs, and landmarks.

Most OCR tools are optimized for specific languages and alphabets. We can tune these tools in many ways. For example, to automate the reading of invoices, receipts, or contracts. They can also specialize in handwritten or printed paper documents.

The basic outputs from OCR tools are usually the extracted texts and their locations in the image. The data extracted with these tools can then serve various purposes, depending on your needs. From uploading the extracted text to simple Word documents to turning the recognized text to speech format for visually impaired users.

OCR programs can also do a layout analysis for transforming text into a table. Or they can integrate natural language processing (NLP) for further text analysis and extraction of named entities (NER). For example, identifying numbers, famous people or locations in the text, like ‘Albert Einstein’ or ‘Eiffel Tower’.

Technologies Related to OCR

You can also meet the term optical word recognition (OWR). This technology is not as widely used as the optical character recognition software. It involves the recognition and extraction of individual words or groups of words from an image.

There is also optical mark recognition (OMR). This technology can detect and interpret marks made on paper or other media. It can work together with OCR technology, for instance, to process and grade tests or surveys.

And last but not least, there is intelligent character recognition (ICR). It is a specific OCR optimised for the extraction of handwritten text from an image. All these advanced methods share some underlying principles.

What are GPT and ChatGPT?

Generative pre-trained transformer (GPT), is an AI text model that is able to generate textual outputs based on input (prompt). GPT models are large language models (LLMs) powered by deep learning and relying on neural networks. They are incredibly powerful tools and can do content creation (e.g., writing paragraphs of blog posts), proofreading and error fixing, explaining concepts & ideas, and much more.

The Impact of ChatGPT

ChatGPT introduced by OpenAI and Microsoft is an extension of the GPT model, which is further optimized for conversations. It has had a great impact on how we search, work with and process data.

GPT models are trained on huge amounts of textual data. So they have better knowledge than an average human being about many topics. In my case, ChatGPT has definitely better English writing & grammar skills than me. Here’s an example of ChatGPT explaining quantum computing:

ChatGPT model explaining quantum computing. [source: OpenAI]

It is no overstatement to say that the introduction of ChatGPT revolutionized data processing, analysis, search, and retrieval.

How Can OCR & GPT Be Combined For Smart Text Extraction

The combination of OCR with GPT models enables us to use this technology to its full potential. GPT can understand, analyze and edit textual inputs. That is why it is ideal for post-processing of the raw text data extracted from images with OCR technology. You can give the text to the GPT and ask simple questions such as “What are the items on the invoice and what is the invoice price?” and get an answer with the exact structure you need.

This was a very hard problem just a year ago, and a lot of companies were trying to build intelligent document-reading systems, investing millions of dollars in them. The large language models are really game changers and major time savers. It is great that they can be combined with other tools such as OCR and integrated into visual AI systems.

It can help us with many things, including extraction of essential information from images and putting them into text documents or JSON. And in the future, it can revolutionize search engines, and streamline automated text translation or entire workflows of document processing and archiving.

Examples of OCR Software & ChatGPT Working Together

So, now that we can combine computer vision and advanced natural language processing, let’s take a look at how we can use this technology to our advantage.

Reading, Processing and Mining Invoices From PDFs

One of the typical examples of OCR software is reading the data from invoices, receipts, or contracts from image-only PDFs (or other documents). Imagine a part of invoices and receipts your accounting department accepts are physical printed documents. You could scan the document, and instead of opening it in Adobe Acrobat and doing manual data entry (which is still a standard procedure in many accounting departments today), you would let the automated OCR system handle the rest.

Scanned documents can be automatically sent to the API from both computers and mobile phones. The visual AI needs only a few hundred milliseconds to process an image. Then you will get textual data with the desired structure in JSON or another format. You can easily integrate such technology into accounting systems and internal infrastructures to streamline invoice processing, payments or SKU numbers monitoring.

Receipt analysis via Ximilar OCR and OpenAI ChatGPT.

Trading Card Identifying & Reading Powered by AI

In recent years, the collector community for trading cards has grown significantly. This has been accompanied by the emergence of specialized collector websites, comparison platforms, and community forums. And with the increasing number of both cards and their collectors, there has been a parallel demand for automating the recognition and cataloguing collectibles from images.

Ximilar has been developing AI-powered solutions for some of the biggest collector websites on the market. And adding an OCR system was an ideal solution for data extraction from both cards and their graded slabs.

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

We developed an OCR system that extracts all text characters from both the card and its slab in the image. Then GPT processes these texts and provides structured information. For instance, the name of the player, the card, its grade and name of grading company, or labels from PSA.

Extracting text from the trading card via OCR and then using GPT prompt to get relevant information.

Needless to say, we are pretty big fans of collectible cards ourselves. So we’ve been enjoying working on AI not only for sports cards but also for trading card games. We recently developed several solutions tuned specifically for the most popular trading card games such as Pokémon, Magic the Gathering or YuGiOh! and have been adding new features and games constantly. Do you like the idea of trading card recognition automation? See how it works in our public demo.

Try demo

How Can I Use the OCR & GPT API On My Images or PDFs?

Our OCR software is publicly available via an online REST API. This is how you can use it:

Log into Ximilar App
- Get your free API TOKEN to connect to API – Once you sign up to Ximilar App, you will get a free API token, which allows your authentication. The API documentation is here to help you with the basic setup. You can connect it with any programming language and any platform like iOS or Android. We provide a simple Python SDK for calling the API.
- You can also try the service directly in the App under Computer Vision Platform.
For simple text extraction from your image, call the endpoint read.
```
https://api.ximilar.com/ocr/v2/read
```
For text extraction from an image and its post-processing with GPT, use the endpoint read_gpt. To get the results in a deserved structure, you will need to specify the prompt query along with your input images in the API request, and the system will return the results immediately.
```
https://api.ximilar.com/ocr/v2/read_gpt
```
The output is JSON with an ‘_ocr’ field. This dictionary contains texts that represent a list of polygons that encapsulate detected words and sentences in images. The full_text field contains all strings concatenated together. The API is returning also the language name (“lang_name”) and language code (“lang”; ISO 639-1). Here is an example:
```
{
  "_url": "__URL_PATH_TO_IMAGE__
  "_ocr": {
     "texts": [
       {
          "polygon": [[53.0,76.0],[116.0,76.0],[116.0,94.0],[53.0,94.0]],
          "text": "MICKEY MANTLE",
          "prob": 0.9978849291801453
       },
       ...
     ],
     "full_text": "MICKEY MANTLE 1st Base Yankees",
     "lang_name": "english",
     "lang_code": "en
  }
}
```
Our OCR engine supports several alphabets (Latin, Chinese, Korean, Japanese and Cyrillic) and languages (English, German, Chinese, …).

Integrate the Combination of OCR and ChatGPT In Your System

All our solutions, including the combination of OCR & GPT, are available via API. Therefore, they can be easily integrated into your system, website, app, or infrastructure.

Here are some examples of up-to-date solutions that can easily be built on our platform and automate your workflows:

Detection, recognition & text extraction system – You can let the users of your website or app upload images of collectibles and get relevant information about them immediately. Once they take an image of the item, our system detects its position (and can mark it with a bounding box). Then, it recognizes their features (e.g., name of the card, collectible coin or comic book), extracts texts with OCR and you will get text data for your website (e.g., in a table format).
Card grade reading system – If your users upload images of graded cards or other collectibles, our system can detect everything including the grades and labels on the slabs in a matter of milliseconds.
Comic book recognition & search engine – You can extract all texts from each image of a comic book and automatically match it to your database for cataloguing.
Giving your collection or database of collectibles order – Imagine you have a website featuring a rich collection of collectible items, getting images from various sources and comparing their prices. The metadata can be quite inconsistent amongst source websites, or be absent in the case of user-generated content. AI can recognize, match, find and extract information from images based purely on computer vision and independent of any kind of metadata.

Let’s Build Your Solution

If you would like to learn more about how you can automate the workflows in your company, I recommend browsing our page All Solutions, where we briefly explained each solution. You can also check out pages such as Visual AI for Collectibles, or contact us right away to discuss your unique use case. If you’d like to learn more about how we work on customer projects step by step, go to How it Works.

Ximilar’s computer vision platform enables you to develop AI-powered systems for image recognition, visual quality control, and more without knowledge of coding or machine learning. You can combine them as you wish and upgrade any of them anytime.

Don’t forget to visit the free public demo to see how the basic services work. Your custom solution can be assembled from many individual services. This modular structure enables us to upgrade or change any piece anytime, while you save your money and time.

How do custom projects work?

The post When OCR Meets ChatGPT AI in One API appeared first on Ximilar: Visual AI for Business.

How to Build a Good Visual Search Engine?

Michal Lukáč — Mon, 09 Jan 2023 14:08:28 +0000

Visual search is one of the most-demanded computer vision solutions. Our team in Ximilar have been actively developing the best general multimedia visual search engine for retailers, startups, as well as bigger companies, who need to process a lot of images, video content, or 3D models.

However, a universal visual search solution is not the only thing that customers around the world will require in the future. Especially smaller companies and startups now more often look for custom or customizable visual search solutions for their sites & apps, built in a short time and for a reasonable price. What does creating a visual search engine actually look like? And can a visual search engine be built by anyone?

This article should provide a bit deeper insight into the technology behind visual search engines. I will describe the basic components of a visual search engine, analyze approaches to machine learning models and their training datasets, and share some ideas, training tips, and techniques that we use when creating visual search solutions. Those who do not wish to build a visual search from scratch can skip right to Building a Visual Search Engine on a Machine Learning Platform.

What Exactly Does a Visual Search Engine Mean?

The technology of visual search in general analyses the overall visual appearance of the image or a selected object in an image (typically a product), observing numerous features such as colours and their transitions, edges, patterns, or details. It is powered by AI trained specifically to understand the concept of similarity the way you perceive it.

In a narrow sense, the visual search usually refers to a process, in which a user uploads a photo, which is used as an image search query by a visual search engine. This engine in turn provides the user with either identical or similar items. You can find this technology under terms such as reverse image search, search by image, or simply photo & image search.

However, reverse image search is not the only use of visual search. The technology has numerous applications. It can search for near-duplicates, match duplicates, or recommend more or less similar images. All of these visual search tools can be used together in an all-in-one visual search engine, which helps internet users find, compare, match, and discover visual content.

And if you combine these visual search tools with other computer vision solutions, such as object detection, image recognition, or tagging services, you get a quite complex automated image-processing system. It will be able to identify images and objects in them and apply both keywords & image search queries to provide as relevant search results as possible.

Different computer vision systems can be combined on Ximilar platform via Flows. If you would like to know more, here’s an article about how Flows work.

Typical Visual Search Engines:
Google Lens & Pinterest Lens

Big visual search industry players such as Shutterstock, eBay, Pinterest (Pinterest Lens) or Google Images (Google Lens & Google Images) already implemented visual search engines, as well as other advanced, yet hidden algorithms to satisfy the increasing needs of online shoppers and searchers. It is predicted, that a majority of big companies will implement some form of soft AI in their everyday processes in the next few years.

The Algorithm for Training
Visual Similarity

The Components of a Visual Search Tool

Multimedia search engines are very powerful systems consisting of multiple parts. The first key component is storage (database). It wouldn’t be exactly economical to store the full sample (e.g., .jpg image or .mp4 video) in a database. That is why we do not store any visual data for visual search. Instead, we store just a representation of the image, called a visual hash.

The visual hash (also visual descriptor or embedding) is basically a vector, representing the data extracted from your image by the visual search. Each visual hash should be a unique combination of numbers to represent a single sample (image). These vectors also have some mathematical properties, meaning you can compare them, e.g., with cosine, hamming, or Euclidean distance.

So the basic principle of visual search is: the more similar the images are, the more similar will their vector representations be. Visual search engines such as Google Lens are able to compare incredible volumes of images (i.e., their visual hashes) to find the best match in a hundred milliseconds via smart indexing.

How to Create a Visual Hash?

The visual hashes can be extracted from images by standard algorithms such as PHASH. However, the era of big data gives us a much stronger model for vector representation – a neural network. A simple overview of the image search system built with a neural network can look like this:

Extracting visual vectors with the neural network and searching with them in a similarity collection.

This neural network was trained on images from a website selling cosmetics. Here, it extracted the embeddings (vectors), and they were stored in a database. Then, when a customer uploads an image to the visual search engine on the website, the neural network will extract the embedding vector from this image as well, and use it to find the most similar samples.

Of course, you could also store other metadata in the database, and do advanced filtering or add keyword search to the visual search.

Types of Neural Networks

There are several basic architectures of neural networks that are widely used for vector representations. You can encode almost anything with a neural network. The most common for images is a convolutional neural network (CNN).

There are also special architectures to encode words and text. Lately, so-called transformer neural networks are starting to be more popular for computer vision as well as for natural language processing (NLP). Transformers use a lot of new techniques developed in the last few years, such as an attention mechanism. The attention mechanism, as the name suggests, is able to focus only on the “interesting” parts of the image & ignore the unnecessary details.

Training the Similarity Model

There are multiple methods to train models (neural networks) for image search. First, we should know that training of machine learning models is based on your data and loss function (also called objective or optimization function).

Optimization Functions

The loss function usually computes the error between the output of the model and the ground truth (labels) of the data. This feature is used for adjusting the weights of a model. The model can be interpreted as a function and its weights as parameters of this function. Therefore, if the value of the loss function is big, you should adjust the weights of the model.

How it Works

The model is trained iteratively, taking subsamples of the dataset (batches of images) and going over the entire dataset multiple times. We call one such pass of the dataset an epoch. During one batch analysis, the model needs to compute the loss function value and adjust weights according to it. The algorithm for adjusting the weights of the model is called backpropagation. Training is usually finished when the loss function is not improving (minimizing) anymore.

We can divide the methods (based on loss function) depending on the data we have. Imagine that we have a dataset of images, and we know the class (category) of each image. Our optimization function (loss function) can use these classes to compute the error and modify the model.

The advantage of this approach is its simple implementation. It’s practically only a few lines in any modern framework like TensorFlow or PyTorch. However, it has also a big disadvantage: the class-level optimization functions don’t scale well with the number of classes. We could potentially have thousands of classes (e.g., there are thousands of fashion products and each product represents a class). The computation of such a function with thousands of classes/arguments can be slow. There could also be a problem with fitting everything on the GPU card.

Loss Function: A Few Tips

If you work with a lot of labels, I would recommend using a pair-based loss function instead of a class-based one. The pair-based function usually takes two or more samples from the same class (i.e., the same group or category). A model based on a pair-based loss function doesn’t need to output prediction for so many unique classes. Instead, it can process just a subsample of classes (groups) in each step. It doesn’t know exactly whether the image belongs to class 1 or 9999. But it knows that the two images are from the same class.

Images can be labelled manually or by a custom image recognition model. Read more about image recognition systems.

The Distance Between Vectors

The picture below shows the data in the so-called vector space before and after model optimization (training). In the vector space, each image (sample) is represented by its embedding (vector). Our vectors have two dimensions, x and y, so we can visualize them. The objective of model optimization is to learn the vector representation of images. The loss function is forcing the model to predict similar vectors for samples within the same class (group).

By similar vectors, I mean that the Euclidean distance between the two vectors is small. The larger the distance, the more different these images are. After the optimization, the model assigns a new vector to each sample. Ideally, the model should maximize the distance between images with different classes and minimize the distance between images of the same class.

Optimization for visual search should maximize the distance of items between different categories and minimize the distance within the category.

Sometimes we don’t know anything about our data in advance, meaning we do not have any metadata. In such cases, we need to use unsupervised or self-supervised learning, about which I will talk later in this article. Big tech companies do a lot of work with unsupervised learning. Special models are being developed for searching in databases. In research papers, this field is often called deep metric learning.

Supervised & Unsupervised Machine Learning Methods

1) Supervised Learning

As I mentioned, if we know the classes of images, the easiest way to train a neural network for vectors is to optimize it for the classification problem. This is a classic image recognition problem. The loss function is usually cross-entropy loss. In this way, the model is learning to predict predefined classes from input images. For example, to say whether the image contains a dog, a cat or a bird. We can get the vectors by removing the last classification layer of the model and getting the vectors from some intermediate layer of the network.

When it comes to the pair-based loss function, one of the oldest techniques for metric learning is the Siamese network (contrastive learning). The name contains “Siamese” because there are two identical models of the same weights. In the Siamese network, we need to have pairs of images, which we label based on whether they are or aren’t equal (i.e., from the same class or not). Pairs in the batch that are equal are labelled with 1 and unequal pairs with 0.

In the following image, we can see different batch construction methods that depend on our model: Siamese (contrastive) network, Triplet, or N-pair, which I will explain below.

Each deep learning architecture requires different batch construction methods. For example, Siamese and N-pair require tuples. However, in N-pair, the tuples must be unique.

Triplet Neural Network and Online/Offline Mining

In the Triplet method, we construct triplets of items, two of which (anchor and positive) belong to the same category and the third one (negative) to a different category. This can be harder than you might think because picking the “right” samples in the batch is critical. If you pick items that are too easy or too difficult, the network will converge (adjust weights) very slowly or not at all. The triplet loss function contains an important constant called margin. Margin defines what should be the minimum distance between positive and negative samples.

Picking the right samples in deep metric learning is called mining. We can find optimal triplets via either offline or online mining. The difference is, that during offline mining, you are finding the triplets at the beginning of each epoch.

Online & Offline Mining

The disadvantage of offline mining is that computing embeddings for each sample is not very computationally efficient. During the epoch, the model can change rapidly, so embeddings are becoming obsolete. That’s why online mining of triplets is more popular. In online mining, each batch of triplets is created before fitting the model. For more information about mining and batch strategies for triplet training, I would recommend this post.

We can visualize the Triplet model training in the following way. The model is copied three times, but it has the same shared weights. Each model takes one image from the triplet (anchor, positive, negative) and outputs the embedding vector. Then, the triplet loss is computed and weights are adjusted with backpropagation. After the training is done, the model weights are frozen and the output of the embeddings is used in the similarity engine. Because the three models have shared weights (the same), we take only one model that is used for predicting embedding vectors on images.

Triplet network that takes a batch of anchor, positive and negative images.

N-pair Models

The more modern approach is the N-pair model. The advantage of this model is that you don’t mine negative samples, as it is with a triplet network. The batch consists of just positive samples. The negative samples are mitigated through the matrix construction, where all non-diagonal items are negative samples.

You still need to do online mining. For example, you can select a batch with a maximum value of the loss function, or pick pairs that are distant in metric space.

The N-pair model requires a unique pair of items. In the triplet and Siamese model, your batch can contain multiple triplets/pairs from the same class (group).

In our experience, the N-pair model is much easier to fit, and the results are also better than with the triplet or Siamese model. You still need to do a lot of experiments and know how to tune other hyperparameters such as learning rate, batch size, or model architecture. However, you don’t need to work with the margin value in the loss function, as it is in triplet or Siamese. The small drawback is that during batch creation, we need to have always only two items per class/product.

Proxy-Based Methods

In the proxy-based methods (Proxy-Anchor, Proxy-NCA, Soft Triple) the model is trying to learn class representatives (proxies) from samples. Imagine that instead of having 10,000 classes of fashion products, we will have just 20 class representatives. The first representative will be used for shoes, the second for dresses, the third for shirts, the fourth for pants and so on.

A big advantage is that we don’t need to work with so many classes and the problems coming with it. The idea is to learn class representatives and instead of slow mining “the right samples” we can use the learned representatives in computing the loss function. This leads to much faster training & convergence of the model. This approach, as always, has some cons and questions like how many representatives should we use, and so on.

MultiSimilarity Loss

Finally, it is worth mentioning MultiSimilarity Loss, introduced in this paper. MultiSimilarity Loss is suitable in cases when you have more than two items per class (images per product). The authors of the paper are using 5 samples per class in a batch. MultiSimilarity can bring closer items within the same class and push the negative samples far away by effectively weighting informative pairs. It works with three types of similarities:

Self-Similarity (the distance between the negative sample and anchor)
Positive-Similarity (the relationship between positive pairs)
Negative-Similarity (the relationship between negative pairs)

Finally, it is also worth noting, that in fact, you don’t need to use only one loss function, but you can combine multiple loss functions. For example, you can use the Triplet Loss function with CrossEntropy and MultiSimilarity or N-pair together with Angular Loss. This should often lead to better results than the standalone loss function.

2) Unsupervised Learning

AutoEncoder

Unsupervised learning is helpful when we have a completely unlabelled dataset, meaning we don’t know the classes of our images. These methods are very interesting because the annotation of data can be very expensive and time-consuming. The most simplistic unsupervised learning can simply use some form of AutoEncoder.

AutoEncoder is a neural network consisting of two parts: an encoder, which encodes the image to the smaller representation (embedding vector), and a decoder, which is trying to reconstruct the original image from the embedding vector.

After the whole model is trained, and the decoder is able to reconstruct the images from smaller vectors, the decoder part is discarded and only the encoder part is used in similarity search engines.

Simple AutoEncoder neural network for learning embeddings via reconstruction of the image.

There are many other solutions for unsupervised learning. For example, we can train AutoEncoder architecture to colourize images. In this technique, the input image has no colour and the decoding part of the network tries to output a colourful image.

Image Inpainting

Another technique is Image Inpainting, where we remove part of the image and the model will learn to inpaint them back. Interesting way to propose a model that is solving jigsaw puzzles or correct ordering of frames of a video.

Then there are more advanced unsupervised models like SimCLR, MoCo, PIRL, SimSiam or GAN architectures. All these models try to internally represent images so their outputs (vectors) can be used in visual search systems. The explanation of these models is beyond this article.

Tips for Training Deep Metric Models

Here are some useful tips for training deep metric learning models:

Batch size plays an important role in deep metric learning. Some methods such as N-pair should have bigger batch sizes. Bigger batch sizes generally lead to better results, however, they also require more memory on the GPU card.
If your dataset has a bigger variation and a lot of classes, use a bigger batch size for Multi-similarity loss.
The most important part of metric learning is your data. It’s a pity that most research, as well as articles, focus only on models and methods. If you have a large collection with a lot of products, it is important to have a lot of samples per product. If you have fewer classes, try to use some unsupervised method or cross-entropy loss and do heavy augmentations. In the next section, we will look at data in more depth.
Try to start with a pre-trained model and tune the learning rate.
When using Siamese or Triplet training, try to play with the margin term, all the modern frameworks will allow you to change it (make it harder) during the training.
Don’t forget to normalize the output of the embedding if the loss function requires it. Because we are comparing vectors, they should be normalized in a way that the norm of the vectors is always 1. This way, we are able to compute Euclidean or cosine distances.
Use advanced methods such as MultiSimilarity with big batch size. If you use Siamese, Triplet, or N-pair, mining of negatives or positives is essential. Start with easier samples at the beginning and increase the challenging samples every epoch.

Neural Text Search on Images with CLIP

Up to right now, we were talking purely about images and searching images with image queries. However, a common use case is to search the collection of images with text input, like we are doing with Google or Bing search. This is also called Text-to-Image problem, because we need to transform text representation to the same representation as images (same vector space). Luckily, researchers from OpenAI develop a simple yet powerful architecture called CLIP (Contrastive Language Image Pre-training). The concept is simple, instead of training on pair of images (SIAMESE, NPAIR) we are training two models (one for image and one for text) on pairs of images and texts.

The architecture of CLIP model by OpenAI. Image Source Github

You can train a CLIP model on a dataset and then use it on your images (or videos) collection. You are able to find similar images/products or try to search your database with a text query. If you would like to use a CLIP-like model on your data, we can help you with the development and integration of the search system. Just contact us at care@ximilar.com, and we can create a search system for your data.

Try search demo

The Training Data
for Visual Search Engines

99 % of the deep learning models have a very expensive requirement: data. Data should not contain any errors such as wrong labels, and we should have a lot of them. However, obtaining enough samples can be a problematic and time-consuming process. That is why techniques such as transfer learning or image augmentation are widely used to enrich the datasets.

How Does Image Augmentation Help With Training Datasets?

Image augmentation is a technique allowing you to multiply training images and therefore expand your dataset. When preparing your dataset, proper image augmentation is crucial. Each specific category of data requires unique augmentation settings for the visual search engine to work properly. Let’s say you want to build a fashion visual search engine based strictly on patterns and not the colours of items. Then you should probably employ heavy colour distortion and channel-swapping augmentation (randomly swapping red, green, or blue channels of an image).

On the other hand, when building an image search engine for a shop with coins, you can rotate the images and flip them to left-right and upside-down. But what to do if the classic augmentations are not enough? We have a few more options.

Removing or Replacing Background

Most of the models that are used for image search require pairs of different images of the same object. Typically, when training product image search, we use an official product photo from a retail site and another picture from a smartphone, such as a real-life photo or a screenshot. This way, we get a pair-based model that understands the similarity of a product in pictures with different backgrounds, lights, or colours.

The difference between a product photo and a real-life image made with a smartphone, both of which are important to use when training computer vision models.

All such photos of the same product belong to an entity which we call a Similarity Group. This way, we can build an interactive tool for your website or app, which enables users to upload a real-life picture (sample) and find the product they are interested in.

Background Removal Solution

Sometimes, obtaining multiple images of the same group can be impossible. We found a way to tackle this issue by developing a background removal model that can distinguish the dominant foreground object from its background and detect its pixel-accurate position.

Once we know the exact location of the object, we can generate new photos of products with different backgrounds, making the training of the model more effective with just a few images.

The background removal can also be used to narrow the area of augmentation only to the dominant item, ignoring the background of the image. There are a lot of ways to get the original product in different styles, including changing saturation, exposure, highlights and shadows, or changing the colours entirely.

Generating more variants can make your model very robust.

Building such an augmentation pipeline with background/foreground augmentation can take hundreds of hours and a lot of GPU resources. That is why we deployed our Background Removal solution as a ready-to-use image tool.

You can use the Background Removal as a stand-alone service for your image collections, or as a tool for training data augmentation. It is available in public demo, App, and via API.

GAN-Based Methods for Generating New Training Data

One of the modern approaches is to use a Generative Adversarial Network (GAN). GANs are incredibly powerful in generating whole new images from some specific domain. You can simply create a model for generating new kinds of insects or making birds with different textures.

Creating new insect images automatically to train an image recognition system? How cool is that? There are endless possibilities with GAN models for basically any image type. [Source]

The greatest advantage of GAN is you will easily get a lot of new variants, which will make your model very robust. GANs are starting to be widely used in more tasks such as simulations, and I think the gathering of data will cost much less in the near future because of them. In Ximilar, we used GAN to create a GAN Image Upscaler, which adds new relevant pixels to images to increase their resolution and quality.

When creating a visual search system on our platform, our team picks the most suitable neural network architecture, loss functions, and image augmentation settings through the analysis of your visual data and goals. All of these are critical for the optimization of a model and the final accuracy of the system. Some architectures are more suitable for specific problems like OCR systems, fashion recommenders or quality control. The same goes with image augmentation, choosing the wrong settings can destroy the optimization. We have experience with selecting the best tools to solve specific problems.

Annotation System for Building Image Search Datasets

As we can see, a good dataset definitely is one of the key elements for training deep learning models. Obtaining such a collection can be quite expensive and time-consuming. With some of our customers, we build a system that continually gathers the images needed in the training datasets (for instance, through a smartphone app). This feature continually & automatically improves the precision of the deployed search engines.

How does it work? When the new images are uploaded to Ximilar Platform (through Custom Similarity service) either via App or API, our annotators can check them and use them to enhance the training dataset in Annotate, our interface dedicated to image annotation & management of datasets for computer vision systems.

Annotate effectively works with the similarity groups by grouping all images of the same item. The annotator can add the image to a group with the relevant Stock Keeping Unit (SKU), label it as either a product picture or a real-life photo, add some tags, or mark objects in the picture. They can also mark images that should be used for the evaluation and not used in the training process. In this way, you can have two separate datasets, one for training and one for evaluation.

We are quite proud of all the capabilities of Annotate, such as quality control, team cooperation, or API connection. There are not many web-based data annotation apps where you can effectively build datasets for visual search, object detection, as well as image recognition, and which are connected to a whole visual AI platform based on computer vision.

Image annotation tool for building visual search and image similarity models.

How to Improve Visual Search Engine Results?

We already assessed that the optimization algorithm and the training dataset are key elements in training your similarity model. And that having multiple images per product then significantly increases the quality of the trained similarity model. The model (CNN or other modern architecture) for similarity is used for embedding (vector) extraction, which determines the quality of image search.

Over the years that we’ve been training visual search engines for various customers around the world, we were also able to identify several potential weak spots. Their fixing really helped with the performance of searches as well as the relevance of the search results. Let’s take a look at what can improve your visual search engine:

Include Tags

Adding relevant keywords for every image can improve the search results dramatically. We recommend using some basic words that are not synonymous with each other. The wrong keywords for one item are for instance “sky, skyline, cloud, cloudy, building, skyscraper, tall building, a city”, while the good alternative keywords would be “sky, cloud, skyscraper, city”.

Our engine can internally use these tags and improve the search results. You can let an image recognition system label the images instead of adding the keywords manually.

Include Filtering Categories

You can store the main categories of images in their metadata. For instance, in real estate, you can distinguish photos that were taken inside or outside. Based on this, the searchers can filter the search results and improve the quality of the searches. This can also be easily done by an image recognition task.

Include Dominant Colours

Colour analysis is very important, especially when working for a fashion or home decor shop. We built a tool conveniently called Dominant Colors, with several extraction options. The system can extract the main colours of a product while ignoring its background. Searchers can use the colours for advanced filtering.

Use Object Detection & Segmentation

Object detection can help you focus the view of both the search engine and its user on the product, by merely cutting the detected object from the image. You can also apply background removal to search & showcase the products the way you want. For training object detection and other custom image recognition models, you can use our App & Annotate.

Use Optical Character Recognition (OCR)

In some domains, you can have products with text. For instance, wine bottles or skincare products with the name of the item and other text labels that can be read by artificial intelligence, stored as metadata and used for keyword search on your site.

Our visual search engine allows us to combine several features for multimedia search with advanced filtering.

Improve Image Resolution

If the uploaded images from the mobile phones have low resolution, you can use the image upscaler to increase the resolution of the image, screenshot, or video. This way, you will get as much as possible even from user-generated content with potentially lower quality.

Combine Multiple Approaches

Fusion – Combining multiple features like model embeddings, tags, dominant colours, and text increases your chances to build a solid visual search engine. Our system is able to use these different modalities and return the best items accordingly. For example, extracting dominant colours is really helpful in Fashion Search, our service combining object detection, fashion tagging & visual search.

Search Engine and Vector Databases

Once you trained your model (neural network), you can extract and store the embeddings for your multimedia items somewhere. There are a lot of image search engine implementations that are able to work with vectors (embedding representation) that you can use. For example, Annoy from Spotify or FAISS from Facebook developers.

These solutions are open-source (i.e. you don’t have to deal with usage rights) and you can use them for simple solutions. However, they also have a few disadvantages:

After the initial build of the search engine database, you cannot perform any update, insert or delete operations. Once you store the data, you can only perform search queries.
You are unable to use a combination of multiple features, such as tags, colours, or metadata.
There’s no support for advanced filtering for more precise results.
You need to have an IT background and coding skills to implement and use them. And in the end, the system must be deployed on some server, which brings additional challenges.
It is difficult to extend them for advanced use cases, you will need to learn a complex codebase of the project and adjust it accordingly.

Building a Visual Search Engine on a Machine Learning Platform

The creation of a great visual search engine is not an easy task. The mentioned challenges and disadvantages of building complex visual search engines with high performance are the reasons why a lot of companies hesitate to dedicate their time and funds to building them from scratch. That is where AI platforms like Ximilar come into play.

Custom Similarity Service

Ximilar provides a computer vision platform, where a fast similarity engine is available as a service. Anyone can connect via API and fill their custom collection with data and query at the same time. This streamlines the tedious workflow a lot, enabling people to have custom visual search engines fast and, more importantly, without coding. Our image search engines can handle other data types like videos, music, or 3D models. If you want more privacy for your data, the system can also be deployed on your hardware infrastructure.

In all industries, it is important to know what we need from our model and optimize it towards the defined goal. We developed our visual search services with this in mind. You can simply define your data and problem and what should be the primary goal for this similarity. This is done via similarity groups, where you put the items that should be matched together.

Examples of Visual Search Solutions for Business

One of the typical industries that use visual search extensively is fashion. Here, you can look at similarities in multiple ways. For instance, one can simply want to find footwear with a colour, pattern, texture, or shape similar to the product in a screenshot. We built several visual search engines for fashion e-shops and especially price comparators, which combined search by photo and recommendations of alternative similar products.

Based on a long experience with visual search solutions, we deployed several ready-to-use services for visual search: Visual Product Search, a complex visual search service for e-commerce including technologies such as search by photo, similar product recommendations, or image matching, and Fashion Search created specifically for the fashion segment.

Another nice use case is also the story of how we built a Pokémon Trading Card search engine. It is no surprise that computer vision has been recently widely applied in the world of collectibles. Trading card games, sports cards or stamps and visual AI are a perfect match. Based on our customers’ demand, we also created several AI solutions specifically for collectibles.

The Workflow of Building
a Visual Search Engine

If you are looking to build a custom search engine for your users, we can develop a solution for you, using our service Custom Image Similarity. This is the typical workflow of our team when working on a customized search service:

Setup, Research & Plan – Initial calls, the definition of the project, NDA, and agreement on expected delivery time.
Data – If you don’t provide any data, we will gather it for you. Gathering and curating datasets is the most important part of developing machine learning models. Having a well-balanced dataset without any bias to any class leads to great performance in production.
First prototype – Our machine learning team will start working on the model and collection. You will be able to see the first results within a month. You can test it and evaluate it by yourself via our clickable front end.
Development – Once you are satisfied with the results, we will gather more data and do more experiments with the models. This is an iterative way of improving the model.
Evaluation & Deployment – If the system performs well and meets the criteria set up in the first calls (mostly some evaluation on the test dataset and speed performance), we work on the deployment. We will show you how to connect and work with the API for visual similarity (insert, delete, search endpoints).

If you are interested in knowing more about how the cooperation with Ximilar works in general, read our How it works and contact us anytime.

We are also able to do a lot of additional steps, such as:

Managing and gathering more training data continually after the deployment to gradually increase the performance of visual similarity (the usage rights for user-generated content are up to you; keep in mind that we don’t store any physical images).
Building a customized model or multiple models that can be integrated into the search engine.
Creating & maintaining your visual search collection, with automatic synchronization to always keep up to date with your current stock.
Scaling the service to hundreds of requests per second.

Visual Search is Not Only
For the Big Companies

I presented the basic techniques and architectures for training visual similarity models, but of course, there are much more advanced models and the research of this field continues with mile steps.

Search engines are practically everywhere. It all started with AltaVista in 1995 and Google in 1998. Now it’s more common to get information directly from Siri or Alexa. Searching for things with visual information is just another step, and we are glad that we can give our clients tools to maximise their potential. Ximilar has a lot of technical experience with advanced search technology for multimedia data, and we work hard to make it accessible to everyone, including small and medium companies.

If you are considering implementing visual search into your system:

Schedule a call with us and we will discuss your goals. We will set up a process for getting the training data that are necessary to train your machine learning model for search engines.
In the following weeks, our machine learning team will train a custom model and a testable search collection for you.
After meeting all the requirements from the POC, we will deploy the system to production, and you can connect to it via Rest API.

How do custom projects work?

The post How to Build a Good Visual Search Engine? appeared first on Ximilar: Visual AI for Business.