What do you like best?
Of all the tools we evaluated, OpsGenie is the most full-featured, while also having one of the best interfaces that I've seen. It supports all the integrations that we need plus some, and has a powerful API that we've used to write several custom integrations. The routing and policy rules are powerful yet easy to understand. And on top of all that, their Support Team has been stellar! They are super quick to answer, knowledgable, and make sure that our questions are resolved every time.
What do you dislike?
I'd love to see a bit more in terms of alert aggregation, possibly a machine learning algorithm which identifies and groups together related alerts. Also, while their new Reports are awesome, there are still a few more report features we'd like to have -- especially surrounding Incidents.
Recommendations to others considering the product
OpsGenie is relatively new to the alerting world (compared to existing competitors), but they seem to be doing a lot of things right! We evaluated about a dozen other services before settling on OpsGenie. They were the best in terms of features AND price. Make sure you consider all of your alerting sources and see which service supports all of them. For us, only OpsGenie fully supported every alert source we had.
What business problems are you solving with the product? What benefits have you realized?
We are using OpsGenie for several reasons:
1. We can collect all alerts in one central location. Before, we were sending a lot of alerts to Slack, but some would also go to email. If we were trying to search through past alerts, it wasn't easy to know where to look. Now we just search in OpsGenie.
2. With all our alerts going to OpsGenie, we can now run reports and see a lot of useful metrics about our alerts.
3. We got a lot of noisy alerts. OpsGenie gives us ways to reduce the noise. In practice, we've seen up to 90% noise reduction with a few simple policies in place.
4. We needed a way to log Incidents and capture the information about them such as resolution time, how the Incident was resolved, etc. We can now do that with OpsGenie Incidents.