top of page
  • LinkedIn
  • Black Twitter Icon
  • 800px-Google_Scholar_logo.svg
  • GitHub

FOLLOW

MY BLOG


The eye is one of the sensors present in humans similar to our ears, nose, Tong, and skin for vision. Most of the inputs to these sensors in the real world are images, text, and audio. One eye illusion the image as color but in actuality, it's black and white.





Channels/ Kernel:



Digital photographs or images are in 3 channels (RGB). The same image in the newspaper is in 4 colours (C, M, Y, K). The same image in the magazine is in 7 channels. A portrait can create images using millions of distinct colours. DNN works on 3 channels (RGB/ LAB (lab color channels)). DNN does not depend on channels (color). Training results on B&W images will be the same compared to the training results on color images. It is always recommended to divide an image into as many channels as possible. This must be the first step in DNN. Convert a single object into multiple channels. More channel means more features. Why 3 x 3 kernel only? 3 x 3 is a heavily accelerated kernel brought by researchers and NVIDIA in the world. Increasing the kernel puts a lot of pressure on the hardware. The Channels/ Kernel/ features extractors are the same term.



For example, the text feature extractor is in 26 channels. Each alphabet present in it will be a channel. These alphabets will be considered as features. For extracting each feature such as alphabets ‘e’ needs filters which will only allow alphabets ‘e’ and mute other features. Another example is music orchestras have channels equal to the number of musical instruments used. The sound of each instrument is connected to different channels.




How many kernels do we need? Adding high kernels (more dimension) from the beginning will give superior performance but computation will also increase.

Different kernels:

[[0 0 0] [0 1 0] [0 0 0]] ==> identity kernel

[[0 0 0] [1 1 1] [0 0 0]] ==> horizontal kernel

[[0 1 0] [0 1 0] [0 1 0]] ==> vertical kernel



Channels Vs Features:

Features (Edges and gradients) are important rather than colour. Channels for text are going to be all the alphabets ‘e’ while features are going to be a single alphabet ‘e’. Kernel values need to be mapped to the requirement of the network. Kernels make the features brighter by convolution on it.



Receptive field:

RF will be convolution done on the 5x5 image with a 3x3 kernel. Everywhere there will be an increment of 2 RF. 200 RF will be needed for making 1 to 400 images. Global receptive filed is of 5x5 before 3x3 RF and local RF will be 3x3 for just before.


Receptive Field
Receptive Field by Stanford

Our brain has four layers and follows exactly the same steps of neural network:

  • Edges and gradients

  • texture and patterns

  • parts of objects

  • objects

We will learn more about CNN in the next section.

Updated: Oct 2, 2023

The human brain is an important part of visualization. The human brain contains approximately a hundred billion neurons and glia. Neurons are the building block of the brain that receive and send chemical signals. It helps to feel the environment around us. ‘Glia’ is Latin for ‘glue’. Glia also helps in brain signaling along with the neurons. It can act like insulation and speed up the signal transmission. It can also act as an immune cell for non-responding neurons in our brain. Artificial intelligence also follows the same mathematics.


Open brain
Open Brain

How the brain works for visuals. Our eyes act as cameras for capturing real-world images. The images are passed to the back of the brain for detection and recognition. There are different nerves present in the retina of the eye. These nerves convert the captured light into electrical impulses. These electric impulses are sent to the back of the brain using the optic nerve. This particular area in the back of the brain is the primary visual cortex. The visualization depends on light intensity rather than on color. The same method is followed in computer vision.


Eye
Eye

The visualization of the brain function is studied on a newborn kitten with one eye closed. The experiment is executed by David Hubel and Torstein Wiesel in 1964. They hypothesized that there is a period during which the visual nerve cells develop. The retina should receive any visual information during visual nerve cells development. If it did not receive, the cell of the visual cortex redistributes its response in favor of the working eye. Closing one eye of a newborn kitten reduced visual input to the retina. One conclusion was made based on this experiment. There are few brain cells that are mostly focused on detecting vertical lines. There are also few other brain cells that are focused on detecting other patterns of an object.


David Hubel and Torstein Wiesel Experiment
David Hubel and Torstein Wiesel Experiment

With these findings, the computer scientist developed a mathematical model. The model works in the same way as biological neurons. This model is known as artificial neuron. This artificial neuron contains weights for the input streams and has output. The output is based on the mathematical computation of the input and its weight. There can be n number of neurons in the layers of the neural network. There can be m number of layers in the neural network. A convolutional neural network is one such network widely used for computer vision. This is because of its computational efficiency and spatial invariance. You will get to know more about convolutional neural networks in the CV X.X theory part.


Perceptron
Perceptron

From here, the idea for computer vision such as object detection started. A lot of algorithms are present today on this principle of pattern recognition. Artificial intelligence or machine learning can be understood as a pattern recognizer. Artificial intelligence as described in Britannica is a system endowed with intellectual processes. The intelligence of humans, such as the ability to reason, discover meaning, generalize, or learn from past experience.


Humanoid robot with AI
Humanoid robot with AI

This is all explanation of how human vision is replicated into computer vision. This is done with the help of biologists and computer scientists. Mathematicians are extracting information from digital images for a long time. Now, computer vision is the most common technology used in various domains. New fields are also emerging where computer vision can play a very crucial role. Few of the applications are in medicine, machine vision, military, autonomous vehicle, etc.

bottom of page