New paths in teaching artificial intelligence
The use of artificial intelligence and machine learning require training enormous datasets that can be used to effectively teach programs to recognize certain features of the data. In many cases, however, datasets comprising millions of data are not available. In such cases, the data need to be multiplied or “augmentated”. The setting of the control system of a self-driving car can serve as an expressive example for this. In this case, some of the – mostly insignificant – parts of the images describing traffic situations are blurred or smoothed, and as a result, many images can be obtained from a single original one. The method called Gaussian blurring or Gaussian smoothing is also used by popular image editing programs.
However, the method cannot be applied in the case of biological and chemical structures and formulas: Hungarian scientists found a solution to this problem. Researchers at the Protein Information Technology Bioinformatics Group at the ELTE Institute of Mathematics – László Keresztes, Evelin Szögi, and Bálint Varga supervised by Professor Vince Grolmusz – have developed the method of Newtonian blurring, which can be also be used to multiply non-image data.
The new method of Newtonian blurring varies the correction mechanism of data: for greater reliability, the data are weighed or computed several times and they are used averaged. For example, if something is measured ten times and 7 of the 10 weights are selected in all possible ways and averaged separately, the data can be augmented as many times as it is possible to choose 7 from the 10 weights. In this example, it is 120. The method of ELTE researchers does not employ artificial “noise” like the Gaussian blurring, but intervenes in data correction: the quality of the multiplied data is better than that of the individual data, since – in our example – they are the average of 7 weights.
The researchers first applied the Newtonian blurring to a dataset of human braingraphs from 1,053 subjects describing the cerebral connections between distinct areas of the human brain. As a result, they augmented the dataset by one hundred and twenty times, generating 126,360 braingraphs from the original dataset. The researchers calculated each braingraph in five different resolutions, which allowed them to publish 5×126,360, or 631,800 braingraphs. The feasibility of the method in machine learning has been demonstrated on the increased dataset. The new method, called Newtonian blurring by the authors, can be used not only for braingraphs but also for many other datasets, including chemical and biological ones. The calculation of the new braingraphs took approximately three weeks for the 36 computers of the research group. The augmented dataset confirmed the feasibility of the method in machine learning.
The publication can be accessed here.