An Introduction to Intel OpenVINO Toolkit

Reading Time: 5 minutes

Surya Prabhakaran

Contributed by Surya Prabhakaran

In this article, we are going to explore the basics of OpenVINO toolkit.

I was selected for the Udacity’s Intel Edge AI Scholarship Program, after successfully completing the foundational course, where they provided us with a brief introduction to the basics of Intel OpenVINO toolkit, I progressed to the Intel Edge AI for IoT Developers Nanodegree Program.

Intel OpenVINO

Before diving into OpenVINO, let’s have a small introduction to what is AI at the Edge?

What is AI at the Edge?

Edge, in simple words, means “local processing” i.e. there is no need for the device to send data to the cloud, where it is processed and the results are sent back to the device, everything (data pre-processing, optimization, inferencing and prediction) happens locally. Edge AI algorithms can still be trained in the cloud, but get run at the edge.

The advantage of Edge Computing is:-

Since everything in an edge application happens locally, hence there is no need for any network communication. Also, network communications are expensive (power consumption, bandwidth, etc.) and sometimes difficult (remote locations).

The data present in edge applications can be personal data, health data or some business data which could be sensitive if sent to the cloud.

Latency is the time taken for the data to reach the cloud, get processed and return back to the device. Since no network communications are required here, so no latency considerations.

The main aim of the OpenVINO toolkit is to deploy complex Computer Vision algorithms in an edge device.

What is OpenVINO?

OpenVINO stands for Open Visual Inference and Neural Network Optimization. OpenVINO is an open-source toolkit provided by Intel which focuses on optimizing neural network inference. This toolkit helps developers to create cost-effective and robust computer vision applications.

OpenVINO accelerates AI workloads, including computer vision, audio, speech, language, and recommendation systems. It Speeds up time to market via a library of functions and preoptimized kernels. It also includes optimized calls for OpenCV, OpenCL™ kernels, and other industry tools and libraries.

Download and install OpenVINO

You can download OpenVINO from here: –

Installation Steps:-

OpenVINO Workflow

OpenVINO Workflow
OpenVINO Workflow

The steps are:-

  1. Train a model.

The two main components of the OpenVINO toolkit are

Model Optimizer

The model optimizer converts models in multiple different frameworks to an Intermediate Representation. The Intermediate Representation (IR) of a model contains a .xml file and a .bin file.

Frameworks supported by OpenVINO:-

Converting a pre-trained model is a straightforward process. You need to configure the Model Optimizer according to the framework, then convert it using a one-line command.

OpenVINO has a lot of pre-trained models in the model zoo for several purposes like:-

You can find the public model set here.

OpenVINO provides some optimization techniques which makes the model smaller and quicker. In this article, I will be focusing on three optimization techniques:-

  1. Quantization
  2. Freezing
  3. Fusion


The weights and biases of the pre-trained models in OpenVINO comes in different precisions:-

The higher Precision model may result in higher accuracy but it is complex and occupies more space, hence requires heavy computation power. On the other hand, a lower precision model will occupy less space and requires less computation power. Lowering model precision affects the accuracy of the model because as we reduce the model precision we lose some important information which causes a reduction in accuracy.


Freezing in this context is different than what you may think of freezing in training neural networks. In training, this means freezing certain layers so that you can fine-tune and train on only a subset of layers. Here, it is used in the context of an entire model and Tensorflow models in particular. Freezing TensorFlow models will remove certain operations and metadata only needed for training. For example, backpropagation is only required while training and not while inferencing. Freezing a TensorFlow model is usually a good idea whether before performing direct inference or converting with the Model Optimizer.


Fusion, as the name suggests, is the process of combing or fusing multiple-layer operations into a single operation.

For example, a batch normalization layer, activation layer, and convolutional layer could be combined into a single operation. This can be particularly useful for GPU inference, because in a GPU separate operations may occur on separate GPU kernels, while a fused operation occurs on one kernel, thereby incurring less overhead in switching from one kernel to the next.

Model Optimization does not mean an increase in accuracy, by optimizing a model you reduce its size and make it faster.

Inference Engine

Inference Engine, as the name suggests, runs the actual inference on the model. It only works with the Intermediate Representations(IR) that come from the Model Optimizer or the Intel pre-trained models which are already present in the IR format.

Like the Model Optimizer, which provides improvements on the basis of size and complexity of the models to improve memory and computation times, the Inference Engine provides hardware-based optimizations to get further improvements in the model.

The Inference Engine itself is actually built in C++, leading to overall faster operations; however, it is very common to utilize the built-in Python wrapper to interact with it in Python code.

The supported devices for the Inference Engine are all Intel hardware:-

The Inference Engine has two classes:-

After loading the IENetwork to IECore successfully, you will get an Executable Network, to which you will send Inference Requests.

There are two types of Inference Requests:-


In the case of Synchronous Inference, the system will wait and remain idle until the inference response is returned(blocking the main thread). In this case, only one frame is processed at once and the next frame cannot be gathered until the current frame’s inference is complete.


As you might have guessed, in case of Asynchronous Inference, if the response for a particular requests takes a long time, then you don’t hold up, rather you continue with the next process while the current process is executing. Asynchronous Inference ensures faster inference as compared to Synchronous Inference.

Thank you so much for reading this article, I hope by now you have a basic understanding of what is OpenVINO.

By Surya Prabhakaran

Get started quickly with OpenVINO, Intel® Movidius Myriad X VPU and ADLINK Edge software.

ADLINK Vizi-AI an EdgeAI Machine Vision DevKit –

The Vizi-AI starter Devkit includes an Intel Atom® based SMARC computer module with Intel® Movidius Myriad X VPU and 40 pin connector, Intel® Distribution of OpenVINO and ADLINK Edge™ software.

Stay in touch

Sign up to our email list to be notified of the latest industry news