Text this: Adaptive average arterial pressure control by multi-agent on-policy reinforcement learning