Еволюція агентів навчання з підкріпленням за допомогою генетичного алгоритму

Волокита, А.; Герега, Б.

dc.contributor.author	Волокита, А.
dc.contributor.author	Герега, Б.
dc.date.accessioned	2023-11-07T11:00:06Z
dc.date.available	2023-11-07T11:00:06Z
dc.date.issued	2023
dc.identifier.uri	http://ir.stu.cn.ua/123456789/28990
dc.description	Волокита, А. Еволюція агентів навчання з підкріпленням за допомогою генетичного алгоритму / А. Волокита, Б. Герега // Технічні науки та технології. - 2023. - № 2 (32). - С. 175-184.	uk_UA
dc.description.abstract	Це дослідження вивчає використання генетичних алгоритмів для покращення продуктивності агентів, що навчаються за допомогою підкріплення. Ми провели випробування, використовуючи різні параметри нейронної мережі, зокрема ваги, зсуви та функції активації, з метою знайти оптимальні значення, які змушують агента отримувати більше винагород. Наш підхід включає використання знань про предметну область для ініціалізації популяції генетичного алгоритму, а також для оцінки рішень. Це дозволяє нам спрямувати пошук до більш перспективних рішень. Особлива увага приділена впливу різних параметрів генетичного алгоритму на ефективність навчання. Потенційні застосування цього дослідження широкі – від робототехніки та автономних транспортних засобів до ігор та фінансів. Результати дослідження також можна використовувати для розробки нових алгоритмів та методів для покращення продуктивності агентів, що навчаються за допомогою підкріплення, що далі сприятиме розвитку машинного навчання. Наше дослідження показало, що використання генетичного алгоритму може значно покращити ефективність навчання агентів. Результатом роботи є успішне проходження гри CartPole-v0 еволюціонований агентами. 98 % нашої популяції досягнуть максимуму, тобто успішно пройдуть гру.	uk_UA
dc.language.iso	uk	uk_UA
dc.publisher	Чернігів : НУ "Чернігівська політехніка"	uk_UA
dc.relation.ispartofseries	Технічні науки та технології;№ 2 (32)
dc.subject	навчання з підкріпленням	uk_UA
dc.subject	генетичний алгоритм	uk_UA
dc.subject	агент	uk_UA
dc.subject	безградієнтний підхід	uk_UA
dc.subject	нейронна мережа	uk_UA
dc.subject	CartPole	uk_UA
dc.subject	policy gradients	uk_UA
dc.subject	reinforcement learning	uk_UA
dc.subject	genetic algorithm	uk_UA
dc.subject	agent	uk_UA
dc.subject	gradient-free approach	uk_UA
dc.subject	neural network	uk_UA
dc.subject	CartPole	uk_UA
dc.subject	policy gradients	uk_UA
dc.title	Еволюція агентів навчання з підкріпленням за допомогою генетичного алгоритму	uk_UA
dc.title.alternative	Evolution of reinforcement learning agents using the genetic algorithm	uk_UA
dc.type	Article	uk_UA
dc.description.abstractalt1	Reinforcement learning (RL) allows agents to make decisions based on a reward function. However, in the process of learning, the choice of the values of the parameters of the learning algorithm can significantly affect the overall learning process. Agents using the policy gradient algorithm can be trained for a long time, but even then, they may not behave perfectly. Thinking more about it, we realized that the reason for the long training is that gradients are almost absent, and therefore not very useful. Gradients help in supervised learning tasks, such as image classification, by providing useful information on how to change the parameters (weights or offsets) of the network for better accuracy. In image classification, after each mini-series of training, backpropagation provides a clear gradient (direction) for each parameter in the network. In reinforcement learning, however, the gradient information is only provided occasionally when the environment provides a reward or punishment. In most cases, our agent performs actions without knowing whether they are useful or not. Therefore, in this paper, we will improve the agents by using a genetic algorithm, i.e., we evolve the agents. This research explores the use of genetic algorithms to improve the performance of reinforcement learning agents. We conducted a series of trials using various neural network parameters, including weights, biases, and activation functions, in order to find the optimal values that cause the agent to receive more rewards. Our approach includes the use of domain knowledge to initialize the population of the genetic algorithm as well as to evaluate solutions. This allows us to direct the search towards more promising solutions. Special attention is paid to the impact of various genetic algorithm parameters on learning efficiency. The potential applications of this research are broad, ranging from robotics and autonomous vehicles to gaming and finance. The results of the study can also be used to develop new algorithms and methods to improve the performance of reinforcement learning agents, which further contributes to the development of machine learning. Our research has shown that the use of a genetic algorithm can significantly improve the efficiency of agent learning. The result is the successful completion of the CartPole-v0 game by evolved agents. 98 % of our population will reach the maximum, i.e. successfully complete the game.	uk_UA