Image classification is a task that humans perform naturally with high accuracy. Teaching machines to do the same is challenging, but there are many use cases where machine-driven image classification can bring about revolutionary improvements in processes, such as with medical diagnosis, public safety, self-driving cars and much more.

Image classification models are constructed by measuring the degree of accuracy to which a particular image fits in a pre-defined category. For people engaged in constructing image classification algorithms – such as computer scientists, data scientists, image analysis specialists, machine learning data engineers or machine learning signal process engineers – the goal is to use existing models and results, instead of starting from scratch every time. That is where pre-trained neural networks can make a real difference.

The Importance of Pre-Trained Models

Human evolution is built on knowledge transfer from one generation to another by language and writing. Similarly, pre-trained neural networks borrow intelligence gathered by previous classification models. It’s a solution to a problem someone else had. Having a pre-trained model is similar to having a subject matter expert. You can just use their know-how as a foundation to build new insights. Of course, some modifications and tuning will be necessary, but a pre-trained model saves a considerable amount of time and research.

Image Classification Challenges

A useful classification algorithm is able to assign an image to a broad class and then to granular sub-classes with excellent accuracy. One of the problems is that, even within the same category, there is high variability. To illustrate this, just think about dog breeds. Some of them could look more like a muffin than another dog.

Another challenge is that most neural network methods are based on supervised learning, thus requiring the input of humans to label and classify vast amounts of data. The ImageNet used 1.3 million examples for 1,000 categories. The aim of using such a large amount of data is to learn from this experience and reuse as much as possible the knowledge accumulated so far. Finally, there are numerous pre-trained models to choose from and finding the right one for your problem could take a lot of guesswork, trial and error.

Building Image Classification Models

An excellent step-by-step approach to the image classification problem is offered here. InData Labs suggests pre-processing the images to have a common starting point and avoid variations due to light, background and angle. The transformations aim to find and highlight the most relevant part of each photo.

InData Labs suggests the following model building steps:

Import the pre-trained neural network and apply average pooling to output. With average pooling, the initial information is split into blocks and each block is replaced by the average of the information in the respective block.
Split the data you have into training and validation at the desired rate, to make sure you have enough to teach the system and check for errors. The example given uses a 3-1 ratio, but each set of data could yield better results with different ratios. Then train a dense model on top of the pre-trained network. The technical details are provided in the original InData Labs paper, yet it is worth mentioning the advantages: fast training, convergence under 100 epochs (rounds of training the neural network) and the fact that overfitting (the inability of the network to generalize the examples it learned from, because the training set is too small) can be eliminated at a later stage.
Add the newly trained model to the pre-trained one. A useful tip here is to start working at a low rate to merge the knowledge of both networks, while aiming for stability and convergence.

Fine-Tuning Image Classification Models

Building the model is just the first step. For it to be effective, it needs to be fine-tuned. This fine-tuning is necessary to avoid overfitting -- a phenomenon that occurs when the training set is too small and the algorithm is unable to generalize. The fine-tuning consists of feeding the algorithm with images similar to those used for training, but a bit different. The aim is to teach the network to classify new images in the right categories.

One way to fine-tune is to perform data augmentation by applying common image transformations such as rotation, flipping or zooming or by changing contrast, color or brightness. To get consistent results and robustly train the system, make sure you only feed into the neural network images that have one or two alterations compared with the original.

Another way to get an improvement is to use images of excellent quality. Ask humans to rate them for accuracy before feeding into the system and remove those with the lowest scores.

You can also re-train a pre-trained model by keeping some of the neurons in the lower layers and modifying the upper ones. This keeps the main “knowledge” of the network, bringing only small alterations. A simple comparison would be with a song, keeping the tune and tempo and changing some of the words.

Conclusion

Image classification models can help improve various areas, from medicine to public security. The challenges are related to algorithm accuracy and building knowledge databases that can be reused.

Right now, we are still in the era of neural networks in which computers are like children, learning from books with pictures. The great news is that not only are computers are fast learners, but they can learn from one another as simply as through a copy-paste operation. The tool to do so is represented by pre-trained neural networks, which can be adjusted to new problems and preserve existing learning.

The not-so-great news is the rise of images that are easy to classify by humans but that present problems to the algorithms. This suggests that there is still room for improvement and fine-tuning of the algorithms.