Vectors are one of the most crucial concepts within Machine Learning because many bugs are due to having matrix /vector dimensions that don’t fit properly. Therefore it is essential for a machine learning engineer to have a good understanding of it.
Vectors
Let’s say your an engineer at Tesla and you get a dataset of produced cars with 3 features: Length, Width and Height.
Each of those cars can be represented as a point within a 3-dimensional space.
Datapoints with for example 500 features could be represented within a 500-dimensional space. It is hard for us as humans to imagine more than a 3-dimensional space but computers are very good at it.
A vector is a 1-dimensional array. Think of an vector as a list of values or a row in a table. You could also say that a vector is a matrix but with only one column.
A vector of n-elements is an n-dimensional vector, with one dimension for each element.
So for a 3 dimensional datapoint we could use a 1-by-3 array to hold the 3 features. It represents a set of features, which is why we call it a feature vector.
Matrix
More general than a vector is a matrix. A matrix is a rectangular array of numbers and a vector can be a row or column within a matrix.
Therefore each row in an array could represent a different datapoint. Less general than a vector is a scalar, because it’s just a single number but that’s another topic.
The Dimension of a matrix is going to be written as the number of rows times the number of columns.
To show this more clearly you can see a 4 by 2 matrix below.
Next let’s talk about how to refer to specific elements of the matrix. Matrix elements are just mean the entries/numbers inside the matrix. The picture below shows you just that. If you got a matrix called A, then A subscript ij is going to refer to the i comma j entry. Meaning the entry is in the I row and the j column.
Here are some examples for it:
A11 = 1402
A32 = 1437
It is nearly the same with a vector. If you have a vector called A then A1 would be it’s first element, A2 it’s second and so on.
Tensors
The most general term for all of these concepts above is a Tensor because a Tensor is a multidimensional array.
So a first-order tensor would be a vector. A second order tensor is a matrix and third-order tensors and higher are called higher-order tensors.
For an example you could represent a social graph that represents friends of friends as a higher order tensor.
You probably know googles own library called tensor flow that allows you to build a computational graph where tensors can „flow“ trough a series of mathematical operations.
As computational power and the amount of data we have increases we are becoming more capable of processing multi dimensional data.
Vectors are usually represented in a lot of different ways and are also used in a lot of fields like physics and science. For an example in Einsteins theory of relativity the curvature of spacetime is described by a Riemann Curvature Tensor (in an order 4 tensor, a higher-order tensor).
Any type of data can be represented as a vector because it can be broken down into a set of numbers. Examples would be Images, stock-prices, videos, text, audio and so on.
A common problem in machine learning is that a model is not really accepting the data and therefore keeps throwing errors. Often the solution lies in vectorizing the data which means nothing more than reshaping the data into the required dimensions. A model expects tensors of a certain size and therefore you need to reshape your input data so that it is in the right vector space. Vectorization is essentially just a matrix operation.
There is a python library called Numpy that can do this with just a single line of code.
>>> x.reshape(10, 10)
Vectors don’t just represent data. They help us to represent our models too because many machine learning models represent their learnings as vectors. All types of neural networks do this.
Once data is vectorized we can do a lot of things with it. A so called „Word2Vec“ model turns words into vectors and then we can do mathematical operations with it. We can then for an example see how closely words are connected together by computing the distance between their vectors. An example for it would be that the word Germany is closely related to other wealthy European countries. Word vectors that are similar are likely to be clustered together. Through vectorizing words we are able to capture their semantic meanings numerically.
We compute the distance between two vectors by using the notion of a „vector norm“. A norm is any function that maps vectors to real numbers, that satisfy the following conditions.
The conditions above mean that the lengths need to be always positive, the length of zero implies zero, scalar multiplication extends lengths in a predictable way and distances add reasonably.
Therefore in a basic vector space the norm of a vector would be it’s absolute value and the distance between two numbers. Like this: