Object Detection - Ximilar: Visual AI for Business

When OCR Meets ChatGPT AI in One API

Michal Lukáč — Wed, 14 Jun 2023 09:38:27 +0000

Imagine a world where machines not only have the ability to read text but also comprehend its meaning, just as effortlessly as we humans do. Over the past two years, we have witnessed extraordinary advancements in these areas, driven by two remarkable technologies: optical character recognition (OCR) and ChatGPT (generative pre-trained transformer). The combined potential of these technologies is enormous and offers assistance in numerous fields.

That is why we in Ximilar have recently developed an OCR system, integrated it with ChatGPT and made it available via API. It is one of the first publicly available services combining OCR software and the GPT model, supporting several alphabets and languages. In this article, I will provide an overview of what OCR and ChatGPT are, how they work, and – more importantly – how anyone can benefit from their combination.

What is Optical Character Recognition (OCR)?

OCR (Optical Character Recognition) is a technology that can quickly scan documents or images and extract text data from them. OCR engines are powered by artificial intelligence & machine learning. They use object detection, pattern recognition and feature extraction.

An OCR software can actually read not only printed but also handwritten text in an image or a document and provide you with extracted text information in a file format of your choosing.

How Optical Character Recognition Works?

When an OCR engine is provided with an image, it first detects the position of the text. Then, it uses AI model for reading individual characters to find out what the text in the scanned document says (text recognition).

This way, OCR tools can provide accurate information from virtually any kind of image file or document type. To name a few examples: PDF files containing camera images, scanned documents (e.g., legal documents), old printed documents such as historical newspapers, or even license plates.

A few examples of OCR: transcribing books to electronic form, reading invoices, passports, IDs, and landmarks.

Most OCR tools are optimized for specific languages and alphabets. We can tune these tools in many ways. For example, to automate the reading of invoices, receipts, or contracts. They can also specialize in handwritten or printed paper documents.

The basic outputs from OCR tools are usually the extracted texts and their locations in the image. The data extracted with these tools can then serve various purposes, depending on your needs. From uploading the extracted text to simple Word documents to turning the recognized text to speech format for visually impaired users.

OCR programs can also do a layout analysis for transforming text into a table. Or they can integrate natural language processing (NLP) for further text analysis and extraction of named entities (NER). For example, identifying numbers, famous people or locations in the text, like ‘Albert Einstein’ or ‘Eiffel Tower’.

Technologies Related to OCR

You can also meet the term optical word recognition (OWR). This technology is not as widely used as the optical character recognition software. It involves the recognition and extraction of individual words or groups of words from an image.

There is also optical mark recognition (OMR). This technology can detect and interpret marks made on paper or other media. It can work together with OCR technology, for instance, to process and grade tests or surveys.

And last but not least, there is intelligent character recognition (ICR). It is a specific OCR optimised for the extraction of handwritten text from an image. All these advanced methods share some underlying principles.

What are GPT and ChatGPT?

Generative pre-trained transformer (GPT), is an AI text model that is able to generate textual outputs based on input (prompt). GPT models are large language models (LLMs) powered by deep learning and relying on neural networks. They are incredibly powerful tools and can do content creation (e.g., writing paragraphs of blog posts), proofreading and error fixing, explaining concepts & ideas, and much more.

The Impact of ChatGPT

ChatGPT introduced by OpenAI and Microsoft is an extension of the GPT model, which is further optimized for conversations. It has had a great impact on how we search, work with and process data.

GPT models are trained on huge amounts of textual data. So they have better knowledge than an average human being about many topics. In my case, ChatGPT has definitely better English writing & grammar skills than me. Here’s an example of ChatGPT explaining quantum computing:

ChatGPT model explaining quantum computing. [source: OpenAI]

It is no overstatement to say that the introduction of ChatGPT revolutionized data processing, analysis, search, and retrieval.

How Can OCR & GPT Be Combined For Smart Text Extraction

The combination of OCR with GPT models enables us to use this technology to its full potential. GPT can understand, analyze and edit textual inputs. That is why it is ideal for post-processing of the raw text data extracted from images with OCR technology. You can give the text to the GPT and ask simple questions such as “What are the items on the invoice and what is the invoice price?” and get an answer with the exact structure you need.

This was a very hard problem just a year ago, and a lot of companies were trying to build intelligent document-reading systems, investing millions of dollars in them. The large language models are really game changers and major time savers. It is great that they can be combined with other tools such as OCR and integrated into visual AI systems.

It can help us with many things, including extraction of essential information from images and putting them into text documents or JSON. And in the future, it can revolutionize search engines, and streamline automated text translation or entire workflows of document processing and archiving.

Examples of OCR Software & ChatGPT Working Together

So, now that we can combine computer vision and advanced natural language processing, let’s take a look at how we can use this technology to our advantage.

Reading, Processing and Mining Invoices From PDFs

One of the typical examples of OCR software is reading the data from invoices, receipts, or contracts from image-only PDFs (or other documents). Imagine a part of invoices and receipts your accounting department accepts are physical printed documents. You could scan the document, and instead of opening it in Adobe Acrobat and doing manual data entry (which is still a standard procedure in many accounting departments today), you would let the automated OCR system handle the rest.

Scanned documents can be automatically sent to the API from both computers and mobile phones. The visual AI needs only a few hundred milliseconds to process an image. Then you will get textual data with the desired structure in JSON or another format. You can easily integrate such technology into accounting systems and internal infrastructures to streamline invoice processing, payments or SKU numbers monitoring.

Receipt analysis via Ximilar OCR and OpenAI ChatGPT.

Trading Card Identifying & Reading Powered by AI

In recent years, the collector community for trading cards has grown significantly. This has been accompanied by the emergence of specialized collector websites, comparison platforms, and community forums. And with the increasing number of both cards and their collectors, there has been a parallel demand for automating the recognition and cataloguing collectibles from images.

Ximilar has been developing AI-powered solutions for some of the biggest collector websites on the market. And adding an OCR system was an ideal solution for data extraction from both cards and their graded slabs.

Automatic Recognition of Collectibles

Ximilar built an AI system for the detection, recognition and grading of collectibles. Check it out!

We developed an OCR system that extracts all text characters from both the card and its slab in the image. Then GPT processes these texts and provides structured information. For instance, the name of the player, the card, its grade and name of grading company, or labels from PSA.

Extracting text from the trading card via OCR and then using GPT prompt to get relevant information.

Needless to say, we are pretty big fans of collectible cards ourselves. So we’ve been enjoying working on AI not only for sports cards but also for trading card games. We recently developed several solutions tuned specifically for the most popular trading card games such as Pokémon, Magic the Gathering or YuGiOh! and have been adding new features and games constantly. Do you like the idea of trading card recognition automation? See how it works in our public demo.

Try demo

How Can I Use the OCR & GPT API On My Images or PDFs?

Our OCR software is publicly available via an online REST API. This is how you can use it:

Log into Ximilar App
- Get your free API TOKEN to connect to API – Once you sign up to Ximilar App, you will get a free API token, which allows your authentication. The API documentation is here to help you with the basic setup. You can connect it with any programming language and any platform like iOS or Android. We provide a simple Python SDK for calling the API.
- You can also try the service directly in the App under Computer Vision Platform.
For simple text extraction from your image, call the endpoint read.
```
https://api.ximilar.com/ocr/v2/read
```
For text extraction from an image and its post-processing with GPT, use the endpoint read_gpt. To get the results in a deserved structure, you will need to specify the prompt query along with your input images in the API request, and the system will return the results immediately.
```
https://api.ximilar.com/ocr/v2/read_gpt
```
The output is JSON with an ‘_ocr’ field. This dictionary contains texts that represent a list of polygons that encapsulate detected words and sentences in images. The full_text field contains all strings concatenated together. The API is returning also the language name (“lang_name”) and language code (“lang”; ISO 639-1). Here is an example:
```
{
  "_url": "__URL_PATH_TO_IMAGE__
  "_ocr": {
     "texts": [
       {
          "polygon": [[53.0,76.0],[116.0,76.0],[116.0,94.0],[53.0,94.0]],
          "text": "MICKEY MANTLE",
          "prob": 0.9978849291801453
       },
       ...
     ],
     "full_text": "MICKEY MANTLE 1st Base Yankees",
     "lang_name": "english",
     "lang_code": "en
  }
}
```
Our OCR engine supports several alphabets (Latin, Chinese, Korean, Japanese and Cyrillic) and languages (English, German, Chinese, …).

Integrate the Combination of OCR and ChatGPT In Your System

All our solutions, including the combination of OCR & GPT, are available via API. Therefore, they can be easily integrated into your system, website, app, or infrastructure.

Here are some examples of up-to-date solutions that can easily be built on our platform and automate your workflows:

Detection, recognition & text extraction system – You can let the users of your website or app upload images of collectibles and get relevant information about them immediately. Once they take an image of the item, our system detects its position (and can mark it with a bounding box). Then, it recognizes their features (e.g., name of the card, collectible coin or comic book), extracts texts with OCR and you will get text data for your website (e.g., in a table format).
Card grade reading system – If your users upload images of graded cards or other collectibles, our system can detect everything including the grades and labels on the slabs in a matter of milliseconds.
Comic book recognition & search engine – You can extract all texts from each image of a comic book and automatically match it to your database for cataloguing.
Giving your collection or database of collectibles order – Imagine you have a website featuring a rich collection of collectible items, getting images from various sources and comparing their prices. The metadata can be quite inconsistent amongst source websites, or be absent in the case of user-generated content. AI can recognize, match, find and extract information from images based purely on computer vision and independent of any kind of metadata.

Let’s Build Your Solution

If you would like to learn more about how you can automate the workflows in your company, I recommend browsing our page All Solutions, where we briefly explained each solution. You can also check out pages such as Visual AI for Collectibles, or contact us right away to discuss your unique use case. If you’d like to learn more about how we work on customer projects step by step, go to How it Works.

Ximilar’s computer vision platform enables you to develop AI-powered systems for image recognition, visual quality control, and more without knowledge of coding or machine learning. You can combine them as you wish and upgrade any of them anytime.

Don’t forget to visit the free public demo to see how the basic services work. Your custom solution can be assembled from many individual services. This modular structure enables us to upgrade or change any piece anytime, while you save your money and time.

How do custom projects work?

The post When OCR Meets ChatGPT AI in One API appeared first on Ximilar: Visual AI for Business.

How to deploy object detection on Nvidia Jetson Nano

Michal Lukáč — Mon, 18 Oct 2021 12:13:16 +0000

At the beginning of summer, we received a request for a custom project for a camera system in a factory located in Africa. The project was about detecting, counting, and visual quality control of the items on the conveyor belts in a factory with the help of visual AI. So we developed a complex system with neural networks on a small computer called Jetson Nano. If you are curious about how we did it, this article is for you. And if you need help with building similar solutions for your factory, our team and tools are here for you.

What is NVIDIA Jetson Nano?

There were two reasons why using our API was not an option. First, the factory has unstable internet connectivity. Also, the entire solution needs to run in real time. So we chose to experiment with embedded hardware that can be deployed in such an environment, and we are very glad that we found Nvidia Jetson Nano.

[Source]

Jetson Nano is an amazing small computer (embedded or edge device) built for AI. It allows you to do machine learning in a very efficient way with low-power consumption (about 5 watts). It can be a part of IoT (Internet of Things) systems, running on Ubuntu & Linux, and is suitable for simple robotics or computer vision projects in factories. However, if you know that you will need to detect, recognize and track tens of different labels, choose the higher version of Jetson embedded hardware, such as Xavier. It is a much faster device than Nano and can solve more complex problems.

What is Jetson Nano good for?

Jetson is great if:

You need a real-time analysis
Your problem can be solved with one or two simple models
You need a budget solution & be cost-effective when running the system
You want to connect it to a static camera – for example, monitoring an assembly line
The system cannot be connected to the internet – for example, because your factory is in a remote place or for security reasons

The biggest challenges in Africa & South Africa remain connectivity and accessibility. AI systems that can run in house and offline can have great potential in such environments.
Deloitte: Industry 4.0 – Is Africa ready for digital transformation?

Object Detection with Jetson Nano

If you need real-time object detection processing, use the Yolo-V4-Tiny model proposed in this repository AlexeyAB/darknet. And other more powerful architectures are available as well. Here is a table of what FPS you can expect when using Yolo-V4-Tiny on Jetson:

Architecture	mAP @ 0.5	FPS
yolov4-tiny-288	0.344	36.6
yolov4-tiny-416	0.387	25.5
yolov4-288	0.591	7.93

Source: Github

After the model’s training is completed, the next step is the conversion of the weights to the TensorRT runtime. TensorRT runtimes make a substantial difference in speed performance on Jetson Nano. So train the model with AlexeyAB/darknet and then convert it with tensorrt_demos repository. The conversion has multiple steps because you first convert darknet Yolo weights to ONNX and then convert to TensorRT.

There is always a trade-off between accuracy and speed. If you do not require a fast model, we also have a good experience with Centernet. Centernet can achieve a really nice mAP with precise boxes. If you run models with TensorFlow or PyTorch backends, then the speed is slower than Yolo models in our experience. Luckily, we can train both architectures and export them in a suitable format for Nvidia Jetson Nano.

Image Recognition on Jetson Nano

For any image categorization problem, I would recommend using simple architecture as MobileNetV2. You can select for example the depth multiplier for mobilenet of 0.35 and image resolution 128×128 pixels. In this way, you can achieve great performance both in speed and precision.

We recommend using TFLITE backend when deploying the recognition model on Jetson Nano. So train the model with the TensorFlow framework and then convert it to TFLITE. You can train recognition models with our platform without any coding for free. Just visit Ximilar App, where you can develop powerful image recognition models and download them for offline usage on Jetson Nano.

A simple Object Detection camera system with the counting of products can be deployed offline in your factory with Jetson Nano.

Recommended camera and utilities

Jetson Nano is simple but powerful hardware. However, it is not as powerful as your laptop or desktop computer. That’s why analyzing 4k images on Jetson will be very slow. I would recommend using max 1080p camera resolution. We used a camera by Raspberry PI, which works very well on Jetson and installation is easy!

I should mention that with Jetson Nano, you can come across some temperature issues. Jetson is normally shipped with a passive cooling system. However, if this small piece of hardware should be in the factory, and run stable for 24 hours, we recommend using an active cooling system like this one. Don’t forget to run the next command so your fan on Jetson starts working:

sudo jetson_clocks --fan

Installation steps & tips for development

When working with Jetson Nano, I recommend following guidelines by Nvidia, for example here is how to install the latest TensorFlow version. There is a great tool called jtop, which visualizes hardware stats as GPU frequency, temperature, memory size, and much more:

jtop tool can help you monitor statistics on Nvidia Jetson Nano.

Remember, the Jetson has shared memory with GPU. You can easily run out of 4 GB when running the model and some programs alongside. If you want to save more than 0.5 GB of memory on Jetson, then run the Ubuntu on LXDE desktop environment/interface. The LXDE is more lightweight than the default Ubuntu environment. To increase memory, you can also create a swap file. But be aware that if your project requires a lot of memory, it can eventually destroy your microSD card. More great tips and hacks can be found on JetsonHacks page.

For improvement of the speed of Jetson, you can also try these two commands, which will set the maximum power input and frequency:

sudo nvpmodel -m0
sudo jetson_clocks

When using the latest image for Jetson, be sure that you are working with the right OpenCV versions of the library. For example, some older tracking algorithms like MOSSE or KCF from OpenCV require a specific version. For some tracking solutions, I recommend looking on PyImageSearch website.

Developing on Jetson Nano

The experience of programming challenging projects, exploring new gadgets, and helping our customers is something that deeply satisfies us. We are looking forward to trying other hardware for machine learning such as Coral from Google, Raspberry Pi, or Intel Movidius for Industry 4.0 projects.

Most of the time, we are developing a machine learning API for large e-commerce sites. We are really glad that our platform can also help us build machine learning models on devices running in distant parts of the world with no internet connectivity. I think that there are many more opportunities for similar projects in the future.

The post How to deploy object detection on Nvidia Jetson Nano appeared first on Ximilar: Visual AI for Business.

Image Annotation Tool for Teams

Michal Lukáč — Thu, 06 May 2021 11:55:57 +0000

Through the years, we worked with many annotation tools. The problem is most of the desktop annotating apps are offline and intended for single-person use, not for team cooperation. The web-based apps, on the other hand, mostly focus on data management with photo annotation, and not on the whole ecosystem with API and inference systems. In this article, I review, what should a good image annotation tool do, and explain the basic features of our own tool – Annotate.

Every big machine learning project requires the active cooperation of multiple team members – engineers, researchers, annotators, product managers, or owners. For example, supervised deep learning for object detection, as well as segmentation, outperforms unsupervised solutions. However, it requires a lot of data with correct annotations. Annotation of images is one of the most time-consuming parts of every deep learning project. Therefore, picking the right annotator tool is critical. When your team is growing and your projects require higher complexity over time, you may encounter new challenges, such as:

Adding labels to the taxonomy would require re-checking a lot of your work
Increasing the performance of your models would require more data
You will need to monitor the progress of your projects

Building solid annotation software for computer vision is not an easy task. And yes, it requires a lot of failures and taking many wrong turns before finding the best solution. So let’s look at what should be the basic features of an advanced data annotation tool.

What Should an Advanced Image Annotation Tool Do?

Many customers are using our cloud platform Ximilar App in very specific areas, such as Fashion, Healthcare, Security, or Industry 4.0. The environment of a proper AI helper or tool should be complex enough to cover requirements like:

Features for team collaboration – you need to assign tasks, and then check the quality and consistency of data
Great user experience for dataset curation – everything should be as simple as possible, but no simpler
Fast production of high-quality datasets for your machine-learning models
Work with complex taxonomies & many models chained with Flows
Fast development and prototyping of new features
Connection to Rest API with Python SDK & querying annotated data

With these needs in mind, we created our own image annotation tool. We use it in our internal projects and provide it to our customers as well. Our technologies for machine learning accelerate the entire pipeline of building good datasets. Whether you are a freelancer tagging pictures or a team managing product collections in e-commerce, Annotate can help.

Our Visual AI tools enable you to work with your own custom taxonomy of objects, such as fashion apparel or things captured by the camera. You can read the basics on the categories & tags and machine learning model training, watch the tutorials, or check our demo and see for yourself how it works.

The Annotate

Annotate is an advanced image annotation tool, which enables you to annotate images precisely and fast. It works as an end-to-end platform for visual data management. You can query the same images, change labels, create objects, draw bounding boxes and even polygons here.

It is a web-based online annotation tool, that works fully on the cloud. Since it is connected to the same back-end & database as Ximilar App, all changes you do in Annotate, manifest in your workspace in App, and vice versa. You can create labels, tasks & models, or upload images through the App, and use them in Annotate.

Ximilar Application and Annotate are connected to the same backend (api.ximilar.com) and the same database.

Annotate extends the functionalities of the Ximilar App. The App is great for training, creating entities, uploading data, and batch management of images (bulk actions for labelling and filtering). Annotate, on the other hand, was created for the detail-oriented management of images. The default single-zoomed image view brings advantages, such as:

Identifying separate objects, drawing polygons and adding metadata to a single image
Suggestions based on AI image recognition help you choose from very complex taxonomies
The annotators focus on one image at a time to minimize the risk of mistakes

Interested in getting to know Annotate better? Let’s have a look at its basic functions.

Deep Focus on a Single Image

If you enter the Images (left menu), you can open any image in the single image view. To the right of the image, you can see all the items located in it. This is where most of the labelling is done. There is also a toolbar for drawing objects and polygons, labelling images, and inspecting metadata.

In addition, you can zoom in/out and drag the image. This is especially helpful when working with smaller objects or big-resolution images. For example, teams annotating medical microscope samples or satellite pictures can benefit from this robust tool.

The main view of the image in our Fashion Tagging workspace

Create Multiple Workspaces

Some of you already know this from other SaaS platforms. The idea is to divide your data into several independent storages. Imagine your company is working on multiple projects at the same time and each of them requires you to label your data with an image annotation tool. Your company account can have many workspaces, each for one project.

Here is our active workspace for Fashion Tagging

Within the workspaces, you don’t mix your images, labels, and tasks. For example, one workspace contains only images for fruit recognition projects (apples, oranges, and bananas) and another contains data on animals (cats and dogs).

Your team members can get access to different workspaces. Also, everyone can switch between the workspaces in the App as well as in Annotate (top right, next to the user icon). Did you know, that the workspaces are also accessible via API? Check out our documentation and learn how to connect to API.

See API Documentation

Train Precise AI Models with Verification

Building good computer vision models requires a lot of data, high-quality annotations, and a team of people who understand the process of building such a dataset. In short, to create high-quality models, you need to understand your data and have a perfectly annotated dataset. In the words of the Director of AI at Tesla, Andrej Karpathy:

Annotate helps you build high-quality AI training datasets by verification. Every image can be verified by different users in the workspace. You can increase the precision by training your models only on verified images.

A list of users who verified the image with the exact dates

Verifying your data is a necessary requirement for the creation of good deep-learning models. To verify the image, simply click the button verify or verify and next (if you are working on a job). You will be able to see who verified any particular image and when.

Create and Track Image Annotating Jobs

When you need to process the newly uploaded images, you can assign them to a Job and a team of people can process them one by one in a job queue. You can also set up exactly how many times each image should be seen by the people processing this queue.

Moreover, you can specify, which photo recognition model or flow of models should be displayed when doing the job. For example, here is the view of the jobs that we are using in one of our tagging services.

Two jobs are waiting to be completed by annotators,
you can start working by hitting the play button on the right

When working on a job, every time an annotator hits the Verify & Next button, it will redirect them to a new image within a job. You can track the progress of each job in the Jobs. Once the image annotation job is complete, the progress bar turns green, and you can proceed to the next steps: retraining the models, uploading new images, or creating another job.

Draw Objects and Polygons

Sometimes, recognizing the most probable category or tags for an image is not enough. That is why Annotate provides a possibility to identify the location of specific things by drawing objects and polygons. The great thing is that you are not paying any credits for drawing objects or labelling. This makes Annotate one of the most cost-effective online apps for image annotation.

Simply click and drag the rectangle with the rectangle tool on canvas to create the detection object.

What exactly do you pay for, when annotating data? The only API credits are counted for data uploads, with volume-based discounts. This makes Annotate an affordable, yet powerful tool for data annotation. If you want to know more, read our newest Article on API Credit Packs, check our Pricing Plans or Documentation.

Annotate With Complex Taxonomies Elegantly

The greatest advantage of Annotate is working with very complex taxonomies and attribute hierarchies. That is why it is usually used by companies in E-commerce, Fashion, Real Estate, Healthcare, and other areas with rich databases. For example, our Fashion tagging service contains more than 600 labels that belong to more than 100 custom image recognition models. The taxonomy tree for some of the biotech projects can be even broader.

Navigating through the taxonomy of labels is very elegant in Annotate – via Flows. Once your Flow is defined (our team can help you with it), you simply add labels to the images. The branches expand automatically when you add labels. In other words, you always see only essential labels for your images.

Simply navigate through your taxonomy tree, expanding branches when clicking on specific labels.

For example, in this image is a fashion object “Clothing”, to which we need to assign more labels. Adding the Clothing/Dresses label will expand the tags that are in the Length Dresses and Style Dresses tasks. If you select the label Elegant from Style Dresses, only features & attributes you need will be suggested for annotation.

Automate Repetitive Tasks With AI

Annotate was initially designed to speed up the work when building computer vision solutions. When annotating data, manual drawing & clicking is a time-consuming process. That is why we created the AI helper tools to automate the entire annotating process in just a few clicks. Here are a few things that you can do to speed up the entire annotation pipeline:

Use the API to upload your previously annotated data to train or re-train your machine learning models and use them to annotate or label more data via API
Create bounding boxes and polygons for object detection & instance object segmentation with one click
Create jobs, share the data, and distribute the tasks to your team members

Predicting bounding boxes with one click automates the entire process of annotation.

Image Annotation Tool for Advanced Visual AI Training

As the main focus of Ximilar is AI for sorting, comparing, and searching multimedia, we integrate the annotation of images into the building of AI search models. This is something that we miss in all other data annotation applications. For the building of such models, you need to group multiple items (images or objects, typically product pictures) into the Similarity Groups. Annotate helps us create datasets for building strong image similarity search models.

Grouping the same or similar images with the Image Annotation Tool. You can tell which item is a smartphone photo or which photos should be located on an e-commerce platform.

Annotate is Always Growing

Annotate was originally developed as our internal image annotation software, and we have already delivered a lot of successful solutions to our clients with it. It is a unique product that any team can benefit from and improve the computer vision models unbelievably fast.

We plan to introduce more data formats like videos, satellite imagery (sentinel maps), 3D models, and more in the future to level up the Visual AI in fields such as visual quality control or AI-assisted healthcare. We are also constantly working on adding new features and improving the overall experience of Ximilar services.

Annotate is available for all users with Business & Professional pricing plans. Would you like to discuss your custom solution or ask anything? Let’s talk! Or read how the cooperation with us works first.

How do custom projects work?

The post Image Annotation Tool for Teams appeared first on Ximilar: Visual AI for Business.

Image Recognition as an Answer to New Energy Labelling

Zuzana Raidová — Wed, 27 Jan 2021 08:45:30 +0000

The year 2021 will bring a fundamental change in the energy labelling of household appliances. Updated labelling should be more efficient, and intuitive, and enable consumers to make better and more informed purchasing decisions. A first large group of goods should be re-labelled by the beginning of March, not only in retail but also in e-shops. Even though such modification brings benefits to the buyers, it poses a great challenge to the online sellers, to which we in Ximilar have a clever solution.

Upcoming Changes in the EU Energy Labelling

The energy labels indicate the energy efficiency category the appliance falls into. In 2019, the European Union approved a new regulation setting a framework for updated energy labelling, which will come into force in 2021 and gradually replace the old system of labels. According to European lawmakers, the new system could save up to 200 billion kWh of energy, which is approximately the same amount of energy all Baltic countries spend together in a year. The first new labels are already in circulation.

Effective March 2021, sellers and manufacturers will be required to update the energy labels on fridges, washing machines, dishwashers, TVs, electronic displays, and refrigerating appliances for display purposes, followed by tyres in May, and lamps in September.

So far, the products have fallen into categories A+++ to G, which will be simplified back to A to G and the energy class of a product will be determined by higher standards. This means the appliance that was A+ in 2020 could be B or C from now on.

Re-scaling is not the only new feature, as the new labels are provided with a QR code leading consumers to the EPREL (European Product Registry for Energy Labelling) database, providing them with detailed energy and environmental information on the goods.

A Challenge for E-commerce Industry

The new regulation applies not only to retail but also to e-commerce, meaning all e-shops will be required to re-label the household appliances as well. They will be required to do so between March 1^st and 18^th.

E-shops need to identify thousands of energy labels in the product galleries and replace them with the new ones.

E-shops generally upload the energy labels as pictures into the galleries on the item pages. Due to the large amounts of images they upload every day, it is not uncommon not to have them tagged.

To ensure a smooth transition from the old label system to the new one, the physical stores will focus on the re-labelling of the displayed goods. The e-shops, on the other hand, will need to identify and replace considerable amounts of pictures in their databases at once. For instance, the largest e-shop selling household appliances in the Czech Republic Alza.cz currently offers approximately 1 200 products in the category of fridges, 500 in washing machines, 350 in dishwashers, 600 TVs, and 1 200 monitors, meaning they will need to update at least 3 850 energy labels in the first wave.

Many large e-shops also cooperate with price comparison websites, such as Heureka, that have their item galleries. For such services, the problem is a bit more complex: as a price analysis tool, the comparison website acquires its data from various sellers meaning its picture tagging or sorting is not standardised, and they have to deal with a wide range of file types and names.

Example of an old EU energy label in a product gallery at Heureka.cz

Such task poses a question: what is the most efficient way to identify the old energy labels amongst other images in the product galleries in order to delete and replace them? The solution lies in the image recognition software.

Smart Solution: Image Recognition

E-shops with electronics typically upload the energy labels as images into the product galleries on their item pages and provide them to the price comparison websites. Therefore, they need software able to sort the product images, reliably recognize the old energy labels and set them aside.

Image Recognition is one of the core services of Ximilar. In principle, once you upload your images to this service, it equips them with tags and sorts them into categories. This service uses computer vision and deep learning to detect a wide range of features in the pictures. It is designed to process extensive databases of pictures in a fraction of a second.

With Ximilar App, you can develop an AI service directly for energy label recognition.

How to Use the Image Recognition on Energy Labels

If you need to identify and replace the old energy labels in your e-shop, there are two ways to use the Ximilar Energy Label Recognition service:

You can train your own recognition model for energy-label images. Then you can use the model as an API endpoint. Meaning, you will send images from the product gallery and get immediate feedback on whether they are or aren’t energy labels.
You can provide us with an export from your product image database (as image URLs or the actual files) and we will take care of the rest for you. You will get the output back in a standard CSV format.

Since image recognition is a CPU/GPU-intensive process, one of the greatest advantages of this service lies in the image database processing on our servers, whether you use the API or leave it to us. Of course, you will have a chance to test the service in the Ximilar App before you run it on your image database.

The energy label recognition with the Ximilar service is an efficient, quick, and above all, reliable way to identify the images that need to be replaced.

Try it in Ximilar App

With Ximilar you can develop more models for energy labels recognition:

Reliable recognition of the old energy labels from the new ones. This might be handy in the transition period when some labels will be already replaced, but others will not.
Reading the actual energy class, especially from the new energy labels. The energy label change is a great opportunity to enrich your product data by this piece of information.

If you are interested, please just fill out our contact form. We are here to help!

The Image Recognition Service Makes E-commerce Easier

Whether you need to sort your catalogue into fine-grained categories, recognize pictures in product galleries, or offer similar products to your customers, Ximilar has a solution for you.

Read more in this detailed article on Image Recognition uses in e-commerce, or contact us, and we can discuss other solutions tailored to your needs.

Try our public demos

The post Image Recognition as an Answer to New Energy Labelling appeared first on Ximilar: Visual AI for Business.

How to Train an Object Detection Model With One Click

Michal Lukáč — Fri, 04 Sep 2020 12:47:05 +0000

Introducing Custom Object Detection on Click!

With our newly released object detection, you are able to train models for finding objects on your images. Ximilar solution allows you to combine Recognition and Detection models in one workflow through the Flows service. On click, without a single line of code!

We are glad that you love our Custom Image Recognition service, which helps you effectively build classification and tagging models. Over time, we have received a lot of messages that you are missing a service for training object detection models. We have spent a lot of time on it, and we know why – making your life easier when building such models. Training detection models of good quality can be quite challenging, and we wanted to be sure to deliver the best solution possible.

What Is Object Detection

The difference between recognition and detection is the following: in recognition, we are interested if a feature/item is present on our image. In reality, there could be many of these items in the image and one would like to know their count and positions. This is exactly the task for object detection. Object detection models can predict the exact locations of items in the form of bounding boxes – rectangles around the objects.

If you want to know more about the technology behind it, read the blog post from our ML specialist Libor Vaněk.

Creating Your First Model Step-by-Step

Define Your Task (Model)

Just log in to app.ximilar.com and click on the Object Detection tile on the dashboard. Click on Create New Task and set the name and description (optional). After that, you need to create detection labels and connect them to the Task. Click on Create New Label tile for your first detection label. After doing this, your task definition is complete. Your task now contains one label, but you can create and connect more.

Upload Your Data

Now we need to upload our dataset and create bounding boxes on your images. Go to the Images page and start uploading. Then go through each of the images and create objects/bounding boxes on them.

As with the Image Recognition service, we recommend starting with a small dataset of about 50 images per label and then increasing the counts. If you already have your dataset with bounding boxes on your local computer, you can use Ximilar Client to upload them.

Train the Model and See the Results

Once your training collection is ready, click the TRAIN button on the TASK page. Training will take some time (up to several hours), so make a coffee and relax.

After the model is successfully optimized, you can use the detect endpoint and test it in production or even connect to the API with Ximilar Client.

Upload More Data

There is a good chance that after the first round, your model will require more images and objects. However, you already have some semi-perfect models trained, and you can use them to help you with creating Bounding Boxes on your new training images – just use the Predict button below the training image. If you want, you can create your independent TEST dataset, you can do it by using the test flag. See the video below.

Flows With Object Detection

This is our most powerful feature right now. You can build a really complex computer vision system by connecting detection and recognition models into a single API endpoint. Imagine first detecting individual items on the image and then recognizing their attributes. This is possible with the new Flows action “Object Selector”. What are the example use cases?

detect all the items on a production line and identify if they have a defect or not
detect fashion products on the person and recognize all their attributes
find the exact position and recognize tooth decays
count and classify all the cars from the parking camera
object recognition for insurance damage and cost prediction
and many more

We will go through one of these examples in an upcoming blog post. Follow us on social media [LN | FB | TW | IN] so you will not miss anything important.

Tell Us About Your Ideas

This is one of the best solutions for detecting bounding boxes, which is available in the market. Why choose our solution?

The UX is great, and we made it really straightforward to use it.
Great performance with SOTA architectures behind it.
The price is affordable.
Download models for offline usage on our higher pricing plans.
Detect items on your images and then recognize features with image recognition through the Flows service.
Configure your image augmentation settings for training and get better performance.
You can A/B test model versions and evaluate the accuracy on an independent dataset.
We are using it in our own custom services, and we keep it updated with new techniques and architectures

If you love this new feature, you would like to discuss anything with us, or you have some custom project from computer vision, then contact us, and we can schedule a call with you.

Try our public demos

The post How to Train an Object Detection Model With One Click appeared first on Ximilar: Visual AI for Business.

Is Ximilar Better Than AI Giants?

Víťa Válka — Tue, 18 Jun 2019 07:43:37 +0000

We get this question occasionally from users of other Visual AI analysis tools, and the simple answer could be yes, it’s better. Nothing is as simple as black and white, so let us compare services from Goliaths like Google, IBM, Amazon and Microsoft with our David-like solution from Ximilar.

To say it simply, artificial intelligence vision got to a point, where it is easy not only to recognize objects in a photo, but also detect features of each thing. That creates a new universe of opportunities for real-world application in e-commerce & traditional industries alike. And Ximilar is a computer vision platform that digs deep into some pretty narrow use cases. So while the big solutions might be great in many ways, Ximilar might very well be the agile alternative.

Ximilar offers you a great cloud AI platform for training your custom image recognition models and advanced visual search services.

Ximilar is Not a Big Corporation

And that is a good thing. Because we keep things simple, streamlined, and we have time to listen to each customer’s needs. We also have the ability to implement new custom features in a timely manner. And we do it as fast as we can, widely benefiting both customers and us, freeing our manpower from manual work.

We at Ximilar create, and continuously improve, advanced visual search, image recognition services & image tools for businesses around the World. That happens in few areas:

Ximilar recognition app – AI cloud platform for training custom image recognition models
Ximilar annotate – image annotation tool for creating great datasets
Visual Search – fashion, stock photos, real estate, home decor, cards, … search engines
Ready-to-use services – fashion tagging, home-decor and collectibles recognition API
Image tools – upscale your images and remove the background with API
and in many cases the most challenging custom AI solutions

We are also not an enterprise that requires millions of users of its services to just stay afloat. See for example how many services were killed by Google. No. Rather than growth in quantity, our center of the universe is how precise we get, and how reliable & sustainable results we deliver. And how we can grow strong together with our customers, or we should rather say our partners.

Here is why Ximilar could be a solid alternative for you if you need to iterate quickly and reach reliable results in narrow fields. Or if you simply need someone who takes your idea further and finds an AI solution to deliver value to your business.

1 – We are focused AI Team

We craft our features to perfection, and we test & use them ourselves. We continuously improve our application for everybody to benefit from new findings in AI vision industry. And we also do things that customers ask for, we don’t just sell access to a platform.

2 – We are an independent company

These days, many companies are created to be acquired. They are created to grow no matter the sustainability of such growth. We are different. Our customers like that we would not disappear tomorrow — getting acquired by a giant and then dissolved into some unreachable feature of some huge app suite is not our target.

3 – We innovate faster

We don’t have a large team and therefore decisions are quick. We are a team of remote professionals working in a field that we truly love and would like to explore to the edge of possibilities. It’s a lot of fun to work on our customers’ challenging tasks. And we are happy to customize any feature. The customer’s budget is the only limit.

4 – Save expenses on AI

Our AI solutions are significantly cheaper than the solutions of big AI players. We are able to save you a lot of money on training and deploying your custom models. For example, training and deploying a model on Google Vertex AI can cost you thousands of dollars, without even calling the API. For Vertex AI AutoML models you are paying for training, deploying and calling a model. Similar pricing for features can be applied to Amazon Rekognition and Azure Custom Vision services. With Amazon Rekognition you are also paying for each hour your model is deployed! On the other hand, AI models built via our platform are trained and deployed for free! You are paying just for calling the API. No more hidden costs.

Head-to-Head Comparison

	Focus	Models	3,000 requests, free model training and deployment	Request Price per 1,000 images	Free plan per month	Visual Search	Expert assistance
Ximilar	Custom Image Recognition, Visual & Similarity Search, Tagging	Fashion, Home-Decor, Collectibles, Custom (classification, tagging, detection)	Optional	$1.0	3,000 requests, free model trainings and deployment	Yes	Yes
Microsoft	Image Recognition	Generic, Custom (classification, tagging, detection)	No	$2	10,000 requests, 1 hour of training	No	No
Amazon	Image & Video Recognition	Generic, Face, Sensitive Content, Text, Celebrity, …	No	$1	5,000 requests	Face only	No
Google	Image Recognition	Generic, Faces, Text, Logos, Landmarks	No	$1.5	1,000 requests	No	No
IBM Watson	Image Recognition	Generic, Faces, Food, Explicit, Custom (classification, tagging)	No	$2	1,000 predictions, 2 trainings of models	No	No
Clarifai	Image & Video Recognition, Similarity Search	Generic, Faces, Nudity, (Fashion) Custom (classification, tagging)…	Optional	$1.2 – 3.2	1,000 operations	Yes	Yes

Narrow Field vs. Generic AI

This one is personal. You would see a lot of simple AI applications, like detecting a cat and a dog in a given — well lit & well shot — picture. But in reality, the bread and butter of applied visual AI is narrow field recognition and analysis of large volumes of images, where the customer needs pretty high accuracy on a specific subject. For example, detect a type of screw on a blurry cellphone photo, shot in bad lighting conditions.

Unlike the giants who mostly sell you ready-made solutions that you can hardly bend to meet your needs, Ximilar is in the other end of the spectrum, brainstorming with customers about how to solve the use case that they have. Being their partner in the path to success.

Examples of such narrow use cases are

Detecting coffee grounds in a cup – for a customer who receives millions of images to their mobile app used to foretell the future for its users. You wouldn’t believe how many users in coffee-drinking countries use such an app.

Recognition of trading cards from photo – A cool use-case that was a dream of every geek. Not anymore. Simply snap a photo of a sports card or a game like Pokémon, and the app will identify a card and return a price listed on eBay. You can build your own portfolio tracker and much more with Ximilar.
Give me a quality rating of a photo – this one was brought up by a hotel reservation site and real estate company. They need to detect the best photos of a property, while the photos are often delivered by a re-seller, or a hotel owner and might not be well shot. And we all know that good photos sell better. Ximilar can help even there with upscaling images and improving their quality.

Lower Price for Higher Accuracy

While the examples above might be fun to read, let’s get to real facts, hardcore numbers and actual user feedback. Because that is a requirement for any business to base its thoughts on. Here are some real-life examples of our customer experiences.

Ximilar Recognition is cheaper and has comparable accuracy as Microsoft Custom Vision, Amazon Rekognition, Google Vertex AI and IBM Watson. At least several of our customers, and users of the Ximilar App, achieve even better accuracy than with the big cloud solutions. Ximilar allows users to control various parameters of training from a simple GUI.

Model versioning in Ximilar App.

UX of Ximilar App is extremely easy to use, also reported by our customers, saying: “Ximilar has a shallow learning curve in comparison to others”. Connection to the API and integration to your systems and apps is easy.
Ximilar has advanced features for tuning of your recognition tasks which no other services provide — flips, rotations, etc.

Advanced settings of image augmentations in Ximilar App.

Ximilar Product Similarity and Custom Similarity are unique services for finding visually similar alternatives in fashion, home decor and other image collections
Ximilar is much more flexible as we are willing to improve our service for your needs – e.g. add more tags to our models — according to your requirements and keep it attached to your data exclusively
We are cheaper — Google AutoML Vision/Vertex AI is significantly much more expensive than our solution
Ximilar Fashion Tagging is at the top of abilities in fashion object recognition
Elaborate management of tags & categories for more projects of higher complexity — we are the only system we know of, that enables users to share training data between categorisation and tagging tasks, chaining recognition models into one API…
Ximilar, unlike the big competition, is able to install the system on-premise, giving you better control over the system, do a lot of flexible customizations

This is just a brief summary of what we see as benefits for you if you use Ximilar as your partner for pioneering the AI world. We see it now as really just the beginning of all the possibilities that might come in the future of automation and machine learning abilities. We have been around for many years now and Ximilar would surely be around for the years to come. Backing you on the way. Enjoying the exploration.

How do custom projects work?

The post Is Ximilar Better Than AI Giants? appeared first on Ximilar: Visual AI for Business.