On Systems of Neural ODEs with Generalized Power Activation Functions

When constructing neural network-based models, it is common practice to use time-tested activation functions such as the hyperbolic tangent, the sigmoid or the ReLU functions. These choices, however, may be suboptimal. The hyperbolic tangent and the sigmoid functions are differentiable but bounded,...

Full description

Saved in:
Bibliographic Details
Main Authors: Vasiliy Ye. Belozyorov, Yevhen V. Koshel
Format: Article
Language:English
Published: Oles Honchar Dnipro National University 2024-08-01
Series:Journal of Optimization, Differential Equations and Their Applications
Subjects:
Online Access:https://model-dnu.dp.ua/index.php/SM/article/view/201
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:When constructing neural network-based models, it is common practice to use time-tested activation functions such as the hyperbolic tangent, the sigmoid or the ReLU functions. These choices, however, may be suboptimal. The hyperbolic tangent and the sigmoid functions are differentiable but bounded, which can lead to vanishing gradient problem. The ReLU is not bounded but is not differentiable in the point 0, which may lead to suboptimal training in some optimizers. One can attempt to use sigmoid-like functions like the cubic root, but it is also not differentiable in the point 0. One activation function that is often overlooked is the identity function. Even though it doesn’t induce nonlinear behavior in the model by itself, it can help build more explainable models more quickly due to non-existent cost of its evaluation, while the non-linearities can be provided by the model’s evaluation rule. In this article, we explore the use of specially-designed unbounded differentiable generalized power activation function, the identity function, and their combinations for approximating univariate time series data with neural ordinary differential equations. Examples are given.
ISSN:2617-0108
2663-6824