Azure Monitor - Activity Logging, Resource Logging and Alerts

Making Sense of Logging in Azure with Azure Monitor, Diagnostic Settings and Activity Log Alerts

I decided this week to finally get my head around how logging and monitoring works in Azure. Rather than just referring to the CIS Benchmark I wanted to be able to provide more meaningful advice to clients about how to get started with logging and monitoring in Azure and provide a baseline set of diagnostic settings and log alerts. In order to enable effective monitoring of your Azure environment, it is necessary to capture Logs. A lack of monitoring reduces the visibility into the control and data plane, and therefore an organization's ability to detect and respond to malicious activity.

Azure supports three types of logs:

  • Activity logs - provide an insight into the operations performed on each Azure resource in the subscription from the outside, known as the management plane. in addition to updates on Service Health events. Use the Activity log to determine the what, who, and when for any write action executed on the resources in your subscription. There's a single activity log for each Azure subscription.
  • Microsoft Entra ID Activity logs - contain the history of sign-in activity and an audit trail of changes made in Microsoft Entra ID for a particular tenant. Are viewed via the Microsoft Entra ID admin centre but can be integrated with Azure Monitor via a log analytics workspace.
  • Resource logs - provide an insight into operations that were performed within an Azure resource. This is known as the data plane. Examples include getting a secret from a key vault, or making a request to a database. The contents of resource logs vary according to the Azure service and resource type and are available for each individual resource within a subscription.

Basic Activity Logs are collected by default and retained for 90 days. Resource Logs (Previously known as Diagnostic Logs) aren't collected until they're enabled and routed to a destination.

Diagnostic Settings define the type of events that are logged and where to send them. you can manage diagnostic settings at the subscription level which allows additional control of how Activity Logs are captured and retained beyond the defaults provided by Microsoft.Additionally, Diagnostic settings are also available for each individual resource within a subscription in order to capture Resource Logs. When configuring Diagnostic Settings, you may choose to export in one of four ways in which you need to ensure appropriate data retention. The options are Log Analytics, Event Hub, Storage Account, and various Partner Solutions.

A good baseline recommendation for configuring Diagnostic Settings is as follows:

  • Enable diagnostic settings for your subscription in order to capture more detailed activity logs and retain them beyond the default 90 day period. Ensure all categories “Administrative”, “Alert”, “Policy” and “Security” are enabled for capture.
  • If storing logs in a Storage Account container, ensure the container is not publicly accessible.
  • If storing logs in a Storage Account container, Encrypt the container with a Customer Managed Key (CMK).
  • Enable all Diagnostic Settings for Azure Key Vault.
  • Enable HTTP Logs for Azure App Service.
  • Enable Diagnostic Settings for all mission critical resources that support it.

Consider implementing diagnostic Settings for all appropriate resources in your environment. Given that the mean time to detection in an enterprise is 240 days, a retention period of two years is recommended.

The process of deploying Diagnostic Settings can be difficult to manage when you have many resources. ARM Templates can be used but to simplify the process of creating and applying diagnostic settings at scale, use Azure Policy to automatically generate diagnostic settings for both new and existing resources. At an additional cost you can choose to route the diagnostics to a Log Analytics Workspace so that they can be used in Azure Monitor or Azure Sentinel. Costs for monitoring will also vary with log volume. Not every resource needs to nessecarily have logging enabled and not all events need to be logged. Consider compliance and governance requirements to determine the security classification of the data being processed by the given resource so you can adjust the level of logging accordingly.

Once you have a good baseline of Activity and Resource logs being collected you can start creating alerts in order to help monitor your environment for suspicious and malicious activities.

Microsoft Azure supports four different types of alerts:

  • Metric Alerts - Metric data is stored in the system already pre-computed. Metric alerts are useful when you want to be alerted about data that requires little or no manipulation. Use metric alerts if the data you want to monitor is available in metric data.
  • Log Alerts - You can use log alerts to perform advanced logic operations on your data. If the data you want to monitor is available in logs, or requires advanced logic, you can use the robust features of Kusto Query Language (KQL) for data manipulation by using log alerts.
  • Activity Log Alerts - Activity logs provide auditing of all actions that occurred on resources. Use activity log alerts to be alerted when a specific event happens to a resource like a restart, a shutdown, or the creation or deletion of a resource. Service Health alerts and Resource Health alerts let you know when there's an issue with one of your services or resources.
  • Smart Detection Alerts - Smart detection on an Application Insights resource automatically warns you of potential performance problems and failure anomalies in your web application. You can migrate smart detection on your Application Insights resource to create alert rules for the different smart detection modules.

Activity Log Alerts will monitor all events in a subscription and can be used to detect malicious control plane activity.

The CIS Benchmark recommends implementing the following rules:

  • Create Policy Assignment
  • Delete Policy Assignment
  • Create or Update Network Security Group
  • Delete Network Security Group
  • Create or Update Security Solution
  • Delete Security Solution
  • Create or Update SQL Server Firewall Rule
  • Delete SQL Server Firewall Rule
  • Create or Update Public IP Address
  • Delete Public Ip Address

And Trend Micro recommend these:

  • Create Policy Assignment
  • Create or Update Load Balancer
  • Create or Update Public IP Address
  • Create/Update Security Solution
  • Create or Update Virtual Machine
  • Create/Update/Delete SQL Server Firewall Rule
  • Create/Update Azure SQL Database
  • Create/Update Network Security Group
  • Create/Update Storage Account
  • Deallocate Virtual Machine
  • Delete Azure SQL Database
  • Delete Key Vault
  • Delete Load Balancer
  • Delete Network Security Group Rule
  • Delete Network Security Group
  • Delete Policy Assignment
  • Delete Public IP Address
  • Delete Security Solution
  • Delete Storage Account
  • Delete Virtual Machine
  • Power Off Virtual Machine
  • Rename Azure SQL Database
  • Update Key Vault
  • Update Security Policy
  • Create/Update MySQL Database
  • Create/Update Network Security Group Rule
  • Create/Update PostgreSQL Database
  • Delete MySQL Database
  • Delete PostgreSQL Database

Resources