Contributed by Surya Prabhakaran
In this article, we are going to explore the basics of OpenVINO toolkit.
I was selected for the Udacity’s Intel Edge AI Scholarship Program, after successfully completing the foundational course, where they provided us with a brief introduction to the basics of Intel OpenVINO toolkit, I progressed to the Intel Edge AI for IoT Developers Nanodegree Program.
Before diving into OpenVINO, let’s have a small introduction to what is AI at the Edge?
What is AI at the Edge?
Edge, in simple words, means “local processing” i.e. there is no need for the device to send data to the cloud, where it is processed and the results are sent back to the device, everything (data pre-processing, optimization, inferencing and prediction) happens locally. Edge AI algorithms can still be trained in the cloud, but get run at the edge.
The advantage of Edge Computing is:-
- No network communication required
Since everything in an edge application happens locally, hence there is no need for any network communication. Also, network communications are expensive (power consumption, bandwidth, etc.) and sometimes difficult (remote locations).
The data present in edge applications can be personal data, health data or some business data which could be sensitive if sent to the cloud.
- No latency considerations
Latency is the time taken for the data to reach the cloud, get processed and return back to the device. Since no network communications are required here, so no latency considerations.
The main aim of the OpenVINO toolkit is to deploy complex Computer Vision algorithms in an edge device.
What is OpenVINO?
OpenVINO stands for Open Visual Inference and Neural Network Optimization. OpenVINO is an open-source toolkit provided by Intel which focuses on optimizing neural network inference. This toolkit helps developers to create cost-effective and robust computer vision applications.
OpenVINO accelerates AI workloads, including computer vision, audio, speech, language, and recommendation systems. It Speeds up time to market via a library of functions and preoptimized kernels. It also includes optimized calls for OpenCV, OpenCL™ kernels, and other industry tools and libraries.
Download and install OpenVINO
You can download OpenVINO from here: –
The steps are:-
- Train a model.
- Feed the model to the Model Optimizer, which optimizes the model and generate an Intermediate Representation (.xml + .bin) of the model.
- Feed the Intermediate Representation to the Inference which checks the compatibility of the model on the basis of the framework (used to train your model) and the environment (hardware).
- Deploy the application.
The two main components of the OpenVINO toolkit are
- Model Optimizer
- Inference Engine
The model optimizer converts models in multiple different frameworks to an Intermediate Representation. The Intermediate Representation (IR) of a model contains a .xml file and a .bin file.
- .xml -> Contains the Model Architecture other important metadata.
- .bin -> Contains the weights and biases of the model in a binary format.
Frameworks supported by OpenVINO:-
- ONNX(PyTorch and Apple ML)
OpenVINO has a lot of pre-trained models in the model zoo for several purposes like:-
- Object Detection
- Object Recognition
- Pose Estimation
You can find the public model set here.
OpenVINO provides some optimization techniques which makes the model smaller and quicker. In this article, I will be focusing on three optimization techniques:-
The weights and biases of the pre-trained models in OpenVINO comes in different precisions:-
- FP32- Floating Point 32-bit
- FP16- Floating Point 16-bit
- INT8- Integer 8-bit
The higher Precision model may result in higher accuracy but it is complex and occupies more space, hence requires heavy computation power. On the other hand, a lower precision model will occupy less space and requires less computation power. Lowering model precision affects the accuracy of the model because as we reduce the model precision we lose some important information which causes a reduction in accuracy.
Freezing in this context is different than what you may think of freezing in training neural networks. In training, this means freezing certain layers so that you can fine-tune and train on only a subset of layers. Here, it is used in the context of an entire model and Tensorflow models in particular. Freezing TensorFlow models will remove certain operations and metadata only needed for training. For example, backpropagation is only required while training and not while inferencing. Freezing a TensorFlow model is usually a good idea whether before performing direct inference or converting with the Model Optimizer.
Fusion, as the name suggests, is the process of combing or fusing multiple-layer operations into a single operation.
For example, a batch normalization layer, activation layer, and convolutional layer could be combined into a single operation. This can be particularly useful for GPU inference, because in a GPU separate operations may occur on separate GPU kernels, while a fused operation occurs on one kernel, thereby incurring less overhead in switching from one kernel to the next.
Model Optimization does not mean an increase in accuracy, by optimizing a model you reduce its size and make it faster.
Inference Engine, as the name suggests, runs the actual inference on the model. It only works with the Intermediate Representations(IR) that come from the Model Optimizer or the Intel pre-trained models which are already present in the IR format.
Like the Model Optimizer, which provides improvements on the basis of size and complexity of the models to improve memory and computation times, the Inference Engine provides hardware-based optimizations to get further improvements in the model.
The Inference Engine itself is actually built in C++, leading to overall faster operations; however, it is very common to utilize the built-in Python wrapper to interact with it in Python code.
The supported devices for the Inference Engine are all Intel hardware:-
- CPU (Central Processing Unit)
- GPU (Graphics Processing Unit)
- NCS-2 (Neural Compute Stick)
- FPGA (Field Programmable Gate Array)
The Inference Engine has two classes:-
- IECore-> Python Wrapper to work with the IE
- IENetwork-> Takes the .xml and .bin files and loads the model and gets loaded into IECore.
After loading the IENetwork to IECore successfully, you will get an Executable Network, to which you will send Inference Requests.
There are two types of Inference Requests:-
In the case of Synchronous Inference, the system will wait and remain idle until the inference response is returned(blocking the main thread). In this case, only one frame is processed at once and the next frame cannot be gathered until the current frame’s inference is complete.
As you might have guessed, in case of Asynchronous Inference, if the response for a particular requests takes a long time, then you don’t hold up, rather you continue with the next process while the current process is executing. Asynchronous Inference ensures faster inference as compared to Synchronous Inference.
Thank you so much for reading this article, I hope by now you have a basic understanding of what is OpenVINO.
By Surya Prabhakaran
Get started quickly with OpenVINO, Intel® Movidius™ Myriad X VPU and ADLINK Edge software.
ADLINK Vizi-AI an EdgeAI Machine Vision DevKit –
The Vizi-AI starter Devkit includes an Intel Atom® based SMARC computer module with Intel® Movidius™ Myriad X VPU and 40 pin connector, Intel® Distribution of OpenVINO and ADLINK Edge™ software.