What if managing your instances were as easy as raising a JIRA ticket? Almost every DevOps team uses JIRA as a standard means of issue tracking & task management. We’ve seen a ton of our customers prefer using an integrated approach to their cloud & workflows. Hence, we’ve adopted a workflow for easier management of instance states through JIRA triggers.
To automate the process end-to-end- the workflow raises a JIRA ticket when a CloudWatch alarm goes off, the corrective action then executes, and the workflow then closes the ticket. The workflow acts as a virtual DevOps engineer. The action here would be rebooting the instances when an alarm for high CPU utilization goes off. See the detailed workflow docs here.
Reboot the process associated with a machine by raising a ticket. The Apache servers associated with the EC2 machines will be rebooted when it causes high CPU utilization(set threshold as per your need). The trigger is the tags specified in the Jira Ticket description. The workflow will create a Jira ticket when the CloudWatch alarm alerts of high CPU utilization and then after the machines reboot, the ticket is closed before the workflow ends.
A common occurrence in Instance management is the risk of overutilization of disk space. Several factors can cause an increase in Diskutilization to go over 90%. For example, user-initiated heavy workloads, analytic queries, prolonged deadlocks & lock waits, multiple concurrent transactions, long-running transactions, or other processes that utilize CPU resources.
Over-utilized instances can incur several performance issues that later affect your budget. Having a simple, automated means of scaling the volume of your instances when necessary take off any management overhead from your side. This use case focuses on automatically increasing disk space by a defined amount when a DSU of above 90% is detected.
In this particular template, you instruct the workflow to increase the DB size by 20GB when the disk space utilization crosses 90%. This event (DSU > 90%) sets off a CloudWatch Alarm, which triggers the TotalCloud workflow. Even if your CloudWatch Alarm alerts you of overutilization in the middle of the night, the workflow would have handled it for you before you even think of having to respond. Since it’s automated, the fix is executed immediately - eliminating any response time delays. If you wish to approve of the action before it occurs, you can enable user approval as well.
The workflow increases your EBS volume by 20GB, as a default value. This value can be altered depending on your workload demands. When a CloudWatch Alarm goes off and sends an SNS alert for high disk space utilization, the workflow is automatically triggered and executes the action. Like we’ve pointed out, you can set the trigger to be anything - a CloudWatch Alarm, any other external system, platform, or ticketing system such as JIRA. In this case, you can also instruct the workflow to create the ticket on your ticketing platform when the Alarm goes off. It can then close the ticket once remediation is completed. This is helpful for logging purposes, and to enable end-to-end automation.
After the Workflow matches the instances to be modified, it requests for user approval. On receiving a green signal, it increases the EBS volume and sends an SSM command that will attach the new volume to its EBS, and inform the OS.
The workflow has two primary steps being achieved with a total of 8 nodes. The first step is to filter out the right instance(s) using simple conditional operations. The second is to modify the volume and apply it to your instance.
The workflow transfers the logs present in the log folder of EC2 machines into a specified S3 Bucket. This use case helps you to store the logs you want, without worrying about increasing the disk space in the machine.