Bridge frontend monitoring

For sake of improving user experience of bridge we would need enable automated alerts for bridge frontend which require setup of appropriate monitoring.

There two main approaches which make sense for me at the moment.
Health checks. e2e selenium
Real user monitoring. GitHub - akamai/boomerang: End user oriented web performance testing and beaconing

One would be setup of scheduled health checks which would trigger scripted selenium usage of the web application. This would have shortcomings of only detecting already know problems more in line with existing infrastructure of testing. Also the time frame of detection would very large due to the time it takes for this to execute. Also additional flakiness introduced by the complexity of the setup. Similar approach was taken by the wallet team in e2e tests.

Real user monitoring would enable to capture general usage/stats of web application.
Capture response codes from all API calls, javascript errors, performance metrics.
All this data would enable us to have simple yet robust detection of incidents.
This kind of service is offered through Google and many more companies.
I would prefer to keep it open source in a spirit of NEAR.
Also with tooling it’s not our intention to spy people, but to collect minimal amount of data
needed for monitoring.

Some people which could help out with this or simply give insight.
@chadoh @alex.shevchenko

3 Likes

Adding also @vgrichina and @frol which could give some input on this.

FYI, we currently have health checks for core bridge components in grafana (only access with credentials).

An idea proposed by @alex.shevchenko was having a regular script that every day sends a transaction back and forth through the bridge (no frontend involved). We can do the same with frontend as proposed by you.

Real user monitoring would enable to capture general usage/stats of web application.
Capture response codes from all API calls, javascript errors, performance metrics.
All this data would enable us to have simple yet robust detection of incidents.

It would be great to collect all logs from all our services (mainly relayers and front end) so in case of any error, we can debug, fix and have appropriate post-mortem easily and faster.

Also with tooling it’s not our intention to spy people, but to collect minimal amount of data
needed for monitoring.

For this purpose we don’t need any type of telemetrics on user side applications, and it doesn’t seem like collecting information from public blockchain is a violation of privacy, so this should not be a concern.

1 Like

Hey @mhala !

Thanks for your suggestions!
Please take a look at the description of the library. It includes the stages of the transfers and it would actually help to monitor the transfers.

What do you think of this architecture?