20
Another question is how to roll out an application update. Azure recommends techniques such
as blue-green deployment or canary releases that push updates in highly controlled way to
minimize possible impacts from a bad deployment.
• Blue-green deployment is a technique in which an update is deployed into a production
environment separate from the live application. After you validate the deployment, switch
the traffic routing to the updated version. For example, Azure App Service Web Apps
enables this with staging slots.
• Canary releases are like blue-green deployments. Instead of switching all traffic to the
updated version, you roll out the update to a small percentage of users, by routing a portion
of the traffic to the new deployment. If there is a problem, back off and revert to the old
deployment. Otherwise, route more of the traffic to the new version, until it gets 100% of
the traffic.
Whatever approach you take, make sure that you can roll back to the last-known-good
deployment, in case the new version is not functioning. Also have a strategy in place to roll back
database changes and any other changes to dependent services. If errors occur, the application
logs must indicate which version caused the error.
Monitor to detect failures
Monitoring is crucial for resiliency. If something fails, you need to know that it failed, and you
need insights into the cause of the failure.
Monitoring a large-scale distributed system poses a significant challenge. Think about an
application that runs on a few dozen VMs — it's not practical to log into each VM, one at a time,
and look through log files, trying to troubleshoot a problem. Moreover, the number of VM
instances is probably not static VMs get added and removed as the application scales in and out,
and occasionally an instance may fail and need to be reprovisioned. In addition, a typical cloud
application might use multiple data stores (Azure storage, SQL Database, Cosmos DB, Redis
cache), and a single user action can span multiple subsystems.
You can think of the monitoring process as a pipeline with several distinct stages:
• Instrumentation. The raw data for monitoring comes from a variety of sources,
including application logs, operating systems performance metrics, Azure monitoring
resources, Azure Service Health and subscriptions and Azure tenants. Most Azure services
expose metrics that you can configure to analyze and determine the cause of problems.
• Collection and storage. Raw instrumentation data can be held in various locations and with
various formats (for example, application trace logs, IIS logs, performance counters). These
disparate sources are collected, consolidated, and put into reliable data stores such as
Application Insights, Azure Monitor metrics, Service Health, storage accounts and Log
Analytics.