Quick summary
This file expands the prior IaC playbook: each of the 25 recommendations now has five actionable sub-recommendations and three concrete examples (total 8 subitems each). Portal & GitHub click references have five steps each. Useful commands include expanded PowerShell/CLI plus Docker (10), AKS (10) and kubectl (10) command sets. The CI workflow runbook (3-stage) has 5 subitems per stage.
Top 25 IaC Recommendations
1. Design modular Bicep modules
- Split by domain: network, identity, storage, compute, monitoring (single responsibility).
- Keep module scope small so unit tests are feasible and side-effects are minimized.
- Provide stable interfaces and avoid leaking internal names/IDs to consumers.
- Favor composition: root template composes modules rather than inlining resources.
- Publish module documentation and examples adjacent to module code.
- Example A: /modules/network.bicep exports vnetId, subnetId, nsgId.
- Example B: /modules/storage.bicep outputs storageAccountId, keyVaultRef sample.
- Example C: Use module registry: publish network module v1.2.0 and reference by version in root.bicep.
2. Define clear module interfaces (params & outputs)
- Use explicit, typed parameters with allowedValues for controlled inputs.
- Provide sensible defaults for non-sensitive parameters to reduce repetition.
- Export minimal outputs (IDs, connection strings, FQDNs) needed by consumers.
- Document each parameter purpose and validation rules in README and in param comments.
- Version-breaking changes require new module major version; avoid silent API drift.
- Example A: network.bicep param location string { allowed: ['eastus','westeurope'] }.
- Example B: storage module outputs accountName and primaryBlobEndpoint used by apps.
- Example C: sample parameters file main.prod.parameters.json demonstrating secret references to Key Vault.
3. Use a single canonical root module per environment
- Root module composes logical modules and loads parameter files per environment.
- Keep environment-specific overrides in parameter files, not in code.
- Use CI to select the appropriate parameter file by branch or environment variable.
- Keep a lightweight entry-point that is easy to review and vet in PRs.
- Reject direct edits to root without CI validation and at least one reviewer familiar with topology.
- Example A: main.bicep includes modules/network.bicep and modules/storage.bicep.
- Example B: main.dev.parameters.json contains dev-specific size/sku values and diagnostic flags.
- Example C: GitHub Actions job passes --parameters @main.prod.parameters.json on production deploy.
4. Keep secrets out of code; use Key Vault + managed identities
- Never commit secrets to Git; use Key Vault and references from runtime or pipeline secrets.
- Use system-assigned or user-assigned managed identities for deployments and resource access.
- Restrict Key Vault access by RBAC and Key Vault firewall; require approval to add new principals.
- Use Key Vault references for App Service and MSI-based retrieval for VMs/containers.
- Rotate secrets and enable purge protection/soft-delete for Key Vaults containing production secrets.
- Example A: use Key Vault reference in app setting: @Microsoft.KeyVault(SecretUri=...)
- Example B: GitHub Actions uses azure/login OIDC to get short-lived token and then fetch Key Vault secrets.
- Example C: CI pipeline stores non-sensitive config in parameter files and secrets in Key Vault accessed by MSI.
5. Implement strict RBAC scoping for deployment principals
- Assign least-privilege roles scoped to the resource group or management group required for deploys.
- Prefer role assignments to groups or service principals with just-in-time elevation via PIM.
- Audit role assignments and require periodic access reviews for CI principals.
- Use Azure AD federated credentials (OIDC) instead of long-lived client secrets where possible.
- Block deployment principal from performing destructive operations by separating duties (deploy vs admin).
- Example A: SPN assigned "Contributor" only on rg-iac-prod scope, not subscription.
- Example B: Use PIM for human elevatable roles and require approval for changes.
- Example C: GitHub Actions uses environment protection with required reviewers and OIDC to avoid stored secrets.
6. Use parameter files per environment and secret streams
- Keep one canonical parameters file per environment (dev/stage/prod) and validate them in CI.
- Do not store secrets in parameter files; reference Key Vault or CI secrets instead.
- Validate parameter schemas and allowed values through linting tools or custom policies.
- Use parameters for SKUs, locations, tags and feature toggles to avoid code changes.
- Secure parameter files in repo by branch protections and require PR reviews for changes to prod files.
- Example A: dev.parameters.json with smaller SKUs and feature flags enabled.
- Example B: prod.parameters.json references Key Vault secret URIs for connection strings.
- Example C: CI pipeline injects secrets via environment secrets and merges runtime parameters before deploy.
7. Validate templates locally before pushing
- Run bicep build and bicep linter locally and in CI on every PR.
- Run az deployment what-if against a test RG to preview changes early.
- Use static analysis to catch invalid property names, wrong API versions, and parameter mismatches.
- Automate local checks in pre-commit hooks to surface errors early to the author.
- Document test commands in repository README to standardize validation steps for contributors.
- Example A: bicep build ./main.bicep && bicep lint ./modules/*
- Example B: az deployment group what-if --resource-group rg-test --template-file main.json --parameters @dev.parameters.json
- Example C: pre-merge GitHub job that runs build+what-if and posts results as PR comment.
8. Use CI checks for linting, build and what-if
- Enforce bicep build, linter and what-if checks as mandatory PR status checks.
- Run unit tests for any template logic (where applicable) and small integration deploys to ephemeral RGs.
- Fail PRs that introduce parameter schema changes without documentation or migration steps.
- Use artifact storage for what-if and build outputs to allow auditors to inspect attempted changes.
- Require at least one approver with infra context for production-impacting PRs.
- Example A: GitHub Actions workflow: validate job runs bicep build and bicep linter.
- Example B: CI posts what-if JSON as workflow artifact and PR comment summarizing delta.
- Example C: CI cancels redundant runs with concurrency groups to save resources.
9. Apply policy-as-code and guardrails
- Deploy Azure Policy assignments via IaC to enforce tags, allowed SKUs and locations.
- Use initiatives to group policies relevant to environments (dev vs prod) and apply at correct scope.
- Automate remediation tasks for non-compliant resources where safe (append required tags).
- Expose policy evaluation results in central dashboard and alert on non-compliant drift.
- Include policy checks in CI to fail deployments that violate guardrails before they reach Azure.
- Example A: Bicep module that creates a policy assignment to require cost center tag.
- Example B: Policy to deny public IPs in production subscriptions.
- Example C: CI pre-deploy policy check job calling Azure Policy REST APIs for expected assignments.
10. Maintain idempotency and declarative expectations
- Design templates so repeated deployments converge to same desired state; avoid imperative provisioning in templates.
- Use resource locks for critical resources to prevent accidental deletions during automated runs.
- Avoid embedding one-off scripts that alter runtime state; prefer a separate data-migration pipeline.
- When using deploymentMode 'Complete' be explicit about intended deletions and gate these operations.
- Test idempotency by running the same deploy multiple times in sandbox and confirming no diffs.
- Example A: Use Incremental mode for day-to-day and Complete only for tear-down pipelines with explicit approvals.
- Example B: Bicep templates avoid runCommands or custom script extensions that mutate persisted data.
- Example C: Implement resource locks around storage accounts or Key Vaults in production template outputs.
11. Tag and inventory every deployed resource
- Require tags (owner, environment, costCenter, lifecycle) via policy and module defaults.
- Emit tags as outputs where systems need to reconcile ownership programmatically.
- Use Resource Graph queries daily to build an inventory and surface missing tags.
- Integrate tag inventory into billing reports and owner notifications for stale resources.
- Fail PRs that remove or change required tags without a documented business reason.
- Example A: Bicep param tags: object and apply to every resource: tags: tags
- Example B: Scheduled job queries Resource Graph and writes CSV of missing tags to storage.
- Example C: Alerting rule triggers when resources without "owner" tag are created in prod subscription.
12. Version modules and use semantic versioning
- Publish modules with semantic versions and reference exact versions from root templates.
- Avoid using 'latest' or branch names for production deployments to prevent unexpected upgrades.
- Maintain changelog per module and require release notes when bumping major versions.
- Use CI to publish module artifacts to registry and keep tag-to-module mapping in README.
- Test new module versions in staging with smoke tests before promoting to prod references.
- Example A: Tag module v1.0.0 in repo and publish to GitHub Packages.
- Example B: root.bicep references modules/network@1.0.0 rather than a branch ref.
- Example C: Release workflow builds module artifact and creates release notes automatically.
13. Use CI/CD with protected environments and approvals
- Create protected GitHub Environments for staging and production with required reviewers and secrets.
- Require manual approval for production deployments and log approver identity and ticket reference.
- Use environment-based secrets scoped to environment rather than repository-wide secrets.
- Enforce branch protections: PR reviews, passing checks, and no direct pushes to protected branches.
- Implement canary or phased production deployment pipelines to limit blast radius of infra changes.
- Example A: GitHub environment "production" requires two approvers and uses OIDC federated token.
- Example B: Deploy job requires environment promotion step and posts deployment runbook link to ticket.
- Example C: Staging auto-deploy job triggers integration tests and only on success allows manual prod approval.
14. Implement drift detection and scheduled what-if
- Schedule regular what-if runs and persist results to artifacts for trend analysis and audit.
- Alert on unexpected deltas (new resources, changed SKUs) and assign to owners for remediation.
- Use deployment history to track manual changes and require IaC reconciliation or remediation jobs.
- Provide dashboards showing percent drift and categorized delta types for prioritization.
- Automate remediation where safe (e.g., missing tags) but keep destructive remediation behind approvals.
- Example A: Nightly GitHub Action runs az deployment what-if and uploads delta JSON to storage.
- Example B: Alert posted to Teams channel if what-if shows > X resource changes in prod RG.
- Example C: Runbook to reconcile drift by re-applying canonical parameter file or tagging fixers automatically.
15. Keep state handling transparent (use ARM with care)
- Document any stateful operations and keep them out of declarative templates if they require non-idempotent steps.
- Use migration pipelines for data changes and schedule them with maintenance windows and backups.
- Expose deployment outputs and store them as artifacts for future debugging and rollbacks.
- Avoid naming resources using unpredictable or changing tokens that break idempotency across deployments.
- Use explicit parameter-driven names for resources that must be stable across reprovisioning.
- Example A: Separate "schema migration" job in pipeline that runs after resource creation and is idempotent.
- Example B: Export and store deployment outputs artifact with timestamps and version tags.
- Example C: Use a naming module that derives names from fixed inputs to prevent accidental renames.
16. Make deployments observable and auditable
- Log deployment inputs, outputs and runtime events to a central store for auditing and post-mortem.
- Publish what-if and deployment templates as artifacts in CI for later review.
- Keep deployment logs (Az CLI output) as job artifacts in GitHub Actions or your CI provider.
- Correlate infra changes with application incidents using tags and deployment timestamps.
- Implement RBAC and logging on pipeline actors and ensure audit events are retained per policy.
- Example A: Action stores what-if JSON and a summary as PR comment and artifact.
- Example B: Export Azure Deployment JSON and push to a secure artifact storage for auditors.
- Example C: Create dashboard correlating deployment time and number of incidents in the next hour.
17. Use tests: unit (bicep linter), integration (what-if), and smoke
- Run bicep linter and syntax checks as part of PR validation.
- Use what-if as an integration test to understand resource deltas without committing changes.
- Deploy ephemeral stacks to a test RG for end-to-end smoke tests when realistic validation is required.
- Automate teardown of ephemeral resources to avoid cost leakage and orphaned resources.
- Record test results and use them as gating criteria for promotion to next environment.
- Example A: CI job deploys to rg-test, runs smoke tests, then tears down if successful.
- Example B: bicep linter run producing a compact report uploaded as Action artifact.
- Example C: Integration test asserting storage account accessible via private endpoint from test VM.
18. Keep templates readable and documented
- Document module contract in a MODULE_README.md adjacent to module implementation.
- Add parameter and output descriptions directly in Bicep using comments for discoverability.
- Keep modules small and refer to examples of usage inside the repo for quick onboarding.
- Embed CI sample commands in README to lower friction for contributors running local validations.
- Require PR descriptions to include a short summary of changes and impact to environments.
- Example A: MODULE_README.md with param examples and required permissions.
- Example B: root README showing sample CLI deploy command for dev and prod.
- Example C: Inline code comments in bicep modules describing rationale for non-obvious choices.
19. Secure CI/CD consoles and enforce MFA / OIDC
- Use OIDC to federate GitHub Actions -> Azure to remove stored long-lived secrets.
- Lock down CI accounts and enforce MFA and conditional access at org level.
- Restrict who can create or modify environment secrets and require approvals for changes.
- Rotate service principal credentials when OIDC is not possible and audit use frequently.
- Limit Action runner permissions and avoid wide-scoped PATs or tokens in workflows.
- Example A: Configure Azure AD federated credential for GitHub Actions OIDC trust.
- Example B: Use azure/login@v1 in Actions with OIDC; no client secret stored.
- Example C: GitHub org enforces SAML and MFA for all user logins with enforced device policies.
20. Prefer parameterized location and SKU strategies
- Parameterize location and SKU values and constrain using allowedValues to prevent mistakes.
- Use policy to prevent unsupported SKUs in production subscriptions.
- Maintain an authoritative allowed-locations document and keep it versioned near IaC code.
- Provide environment-specific SKU maps (dev->small, prod->medium/large) in parameters.
- Fail fast: CI validation should reject unsupported combinations before deployment.
- Example A: param sku string { allowed: ['Standard_LRS','Premium_LRS'] }
- Example B: policy assignment denies unsupported regions for prod subscription.
- Example C: param file maps env->sku and CI job selects appropriate mapping by branch.
21. Protect destructive operations with manual gates
- Require explicit user input flags for pipeline steps that perform delete/complete deployment modes.
- Use environment protection rules and manual approvals for destructive job runs.
- Keep destructive operations in separate pipelines or workflows with clear naming and approval flows.
- Maintain a soft-delete and backup policy for critical resources before destructive actions proceed.
- Record audit trail and ticket references for each manual destructive action taken by the pipeline.
- Example A: deploy workflow requires input confirm_delete=true to run delete job.
- Example B: manual "destroy" job only accessible to custodian group via environment protection.
- Example C: pipeline snapshot of infra state is stored before a destructive change for rollback.
22. Centralize shared state and naming conventions
- Keep naming rules in a single module or file consumed by all modules to eliminate divergence.
- Validate naming via CI step and fail PRs that deviate from the canonical format.
- Use deterministic uniqueString patterns where collisions must be avoided but stability required.
- Document naming exceptions and make them explicit via parameter flags, not hidden logic.
- Store canonical mapping (prefixes, environment codes) in a central configuration store and version it.
- Example A: libs/naming.bicep module that returns standardized resource names.
- Example B: CI job runs a script that validates names against naming policy regex.
- Example C: central naming map file used to generate sample URIs for docs and dashboards.
23. Automate rollback and provide a recovery plan
- Keep last-known-good templates/artifacts and provide a single-click mechanism to redeploy them.
- Document and test rollback runbooks as code and store them with IaC artifacts.
- Provide a "quick deploy by tag" pipeline to roll back to a specific release artifact.
- Automate backups for critical data ahead of infra changes and verify restore procedures periodically.
- Include validation tests in rollback pipelines to confirm service restored to expected behavior.
- Example A: GitHub action "rollback" job deploys root.bicep from tag v1.2.3.
- Example B: automated snapshot of storage before schema migration job runs.
- Example C: documented rollback playbook with exact CLI commands and expected validation checks.
24. Use canonical module registry and package modules
- Publish modules to a registry and consume by exact version to prevent copy/paste drift.
- Implement promotion process: publish to staging registry first, run smoke tests, then promote to prod registry.
- Keep module CI that runs unit tests and security scans before publishing artifacts.
- Document breaking changes and provide migration guides when publishing major version bumps.
- Automate dependency updates in root templates and open PRs for module version bumps with validation jobs.
- Example A: publish module to GitHub Packages and reference package version in root.bicep.
- Example B: module release pipeline tags repo, builds artifact, publishes and updates registry index.
- Example C: automated Dependabot PRs for module minor/patch updates with CI validation.
25. Continuously improve via post-deploy reviews and telemetry
- Run a short post-deploy review after changes to capture lessons and required follow-ups.
- Collect telemetry on deployment failures and categorize root causes to drive improvements.
- Automate issue creation for recurring pipeline flakiness and prioritize fixes as part of sprint work.
- Maintain a deployment health dashboard showing success rate, mean time to recovery and incident correlation.
- Schedule periodic retros and refresh documentation and runbooks based on recurring pain points.
- Example A: CI artifact analysis job extracts common error messages and files tickets automatically.
- Example B: dashboard shows deployment success rate and average rollback time for past 90 days.
- Example C: monthly learning notes published to team Wiki with action items and owners.
Azure Portal and GitHub: exact clicks reference
Create or connect a GitHub repo
- GitHub: Repositories → New → name repo → choose visibility → initialize README; create repo.
- GitHub: Settings → Branches → Protection rules → require PR reviews and status checks for main branch.
- GitHub: Settings → Environments → create staging/production environments, add secrets and required reviewers.
- Azure Portal: Deployment Center → Source → select GitHub → authorize Azure to access repo → choose branch.
- Azure Portal: Deployment Center → Configure build provider (GitHub Actions) → confirm workflow template and commit.
Create Key Vault and set access policy
- Azure Portal: Create → Key Vault → provide name, region, pricing tier → Create.
- Networking tab: choose public access or private endpoint; if private, create/associate a Private Endpoint and private DNS zone.
- Access policies / Access control (IAM): grant Get/List for secrets to required managed identities or service principals.
- Enable purge protection and soft-delete to protect secrets from accidental or malicious deletion.
- Test access: from a VM/managed identity, fetch a secret via CLI or SDK to verify permissions and network path.
Register OIDC trust for GitHub Actions (Federated credential)
- Azure AD → App registrations → New registration or select existing app used by CI/CD.
- Certificates & secrets → Federated credentials → + Add federated credential → issuer https://token.actions.githubusercontent.com.
- Configure subject conditions (repo:org/repo:ref) to scope the trust to specific repo/branch/environment.
- Save and verify: use GitHub Actions azure/login with OIDC provider to confirm token exchange works.
- Audit: review the federated credential's creation event in Azure AD logs and record CI pipeline mapping in docs.
Create GitHub Environment with required reviewers
- GitHub repo → Settings → Environments → New environment → name (production/staging).
- Add required reviewers and specify deployment branch protection (only specific branches may deploy).
- Add environment secrets (e.g., PROD_SUBSCRIPTION_ID) and restrict secret access to selected workflows.
- Set required wait timer or custom review step to ensure cool-off period before production deploys.
- Test: trigger a workflow that targets the environment and ensure the approval prompt appears as configured.
Assign scoped RBAC to the CI principal
- Azure Portal: Subscription or Resource Group → Access control (IAM) → + Add role assignment.
- Select minimal role (Contributor on RG) and pick the CI service principal or federated identity as principal.
- Record role assignment justification and ticket reference in CMDB and enforce periodic review.
- Test deploy with the principal in a staging RG to validate permissions before granting prod access.
- Use Azure AD sign-in logs and access reviews to keep assignments lean and current.
Quick operational notes Practical reminders
Useful PowerShell / Az & CLI snippets
Run these in Cloud Shell, local dev machines, or CI jobs. Replace placeholders before use.
# 1. Sign in (interactive)
Connect-AzAccount
# 2. Sign in (service principal)
az login --service-principal -u APP_ID -p SECRET --tenant TENANT_ID
# 3. Select subscription
Select-AzSubscription -SubscriptionId "SUBSCRIPTION_ID"
az account set --subscription "SUBSCRIPTION_ID"
# 4. Install/upgrade Bicep
az bicep install
az bicep version
# 5. Build Bicep to ARM JSON
bicep build ./main.bicep --outdir ./compiled
# 6. Lint Bicep (requires bicep linter)
bicep --format ./modules/*.bicep
# or using biceps linter in CI (example)
# 7. Validate template (deployment validate)
az deployment group validate --resource-group rg-test --template-file ./main.json --parameters @./parameters/dev.parameters.json
# 8. What-if preview
az deployment group what-if --resource-group rg-test --template-file ./main.json --parameters @./parameters/dev.parameters.json
# 9. Create resource group
az group create --name rg-prod --location eastus
# 10. Deploy Bicep to RG
az deployment group create --resource-group rg-prod --template-file ./main.bicep --parameters @./parameters/prod.parameters.json
# 11. Deploy at subscription scope
az deployment sub create --location eastus --template-file ./main.json --parameters @./parameters/sub.parameters.json
# 12. Create service principal for CI (scoped)
az ad sp create-for-rbac --name "ci-deployer" --role "Contributor" --scopes /subscriptions/SUBSCRIPTION_ID/resourceGroups/rg-deploy
# 13. Create federated credential via az rest (example)
az rest --method POST --uri "https://graph.microsoft.com/v1.0/applications/APP_ID/federatedIdentityCredentials" --body @federated.json
# 14. Export resource group template
az group export --name rg-ase --output json > rg-ase-export.json
# 15. List resource groups
az group list -o table
# 16. Query Resource Graph for missing tags
az graph query -q "Resources | where isnull(tags.owner)" --first 1000
# 17. Add diagnostic setting
az monitor diagnostic-settings create --resource /subscriptions/SUBSCRIPTION_ID/resourceGroups/rg/providers/Microsoft.Web/sites/myapp --workspace /subscriptions/.../resourcegroups/rg-logs/providers/microsoft.operationalinsights/workspaces/la-aks --name diagSettings --logs '[{"category":"AppServiceHTTPLogs","enabled":true}]'
# 18. Get deployment operations
az deployment group operation list --resource-group rg-prod --name deploymentName -o table
# 19. Lock a resource (prevent deletes)
az lock create --name "NoDelete" --resource-group rg-prod --resource-name myResource --resource-type "Microsoft.Storage/storageAccounts" --lock-type CanNotDelete
# 20. Remove lock
az lock delete --name "NoDelete" --resource-group rg-prod
# 21. Set resource tags
az resource tag --tags owner=teamA environment=prod --ids /subscriptions/.../resourceGroups/rg-prod/providers/Microsoft.Storage/storageAccounts/mystorage
# 22. Show resource details
az resource show --ids /subscriptions/.../resourceGroups/rg-prod/providers/Microsoft.Storage/storageAccounts/mystorage
# 23. Run ARM template deployment in complete mode (use with caution)
az deployment group create --resource-group rg-prod --template-file ./main.bicep --mode Complete --parameters @./parameters/prod.parameters.json
# 24. Get deployment what-if delta summary (JSON)
az deployment group what-if --resource-group rg-prod --template-file ./main.json --parameters @./parameters/prod.parameters.json --query "changeSummary" -o json
# 25. Clean up ephemeral test RG (teardown)
az group delete --name rg-test --yes --no-wait
Git / GitHub common commands
Local git + GitHub CLI (gh) commands useful for IaC workflows and release management.
# 1. Clone repo (SSH)
git clone git@github.com:org/repo.git
# 2. Clone repo (HTTPS)
git clone https://github.com/org/repo.git
# 3. Create feature branch
git checkout -b feat/iac-module
# 4. Stage changes
git add .
# 5. Commit changes
git commit -m "add network module"
# 6. Amend commit (if needed)
git commit --amend --no-edit
# 7. Push branch
git push origin feat/iac-module
# 8. Create PR with gh
gh pr create --title "Add network module" --body "This PR adds network.bicep" --base main --head feat/iac-module
# 9. List PRs
gh pr list --state open
# 10. View PR status
gh pr view <pr-number> --web
# 11. Merge PR (squash)
gh pr merge <pr-number> --squash --delete-branch
# 12. Rebase branch onto main
git fetch origin
git rebase origin/main
# 13. Force-push (use carefully)
git push --force-with-lease origin feat/iac-module
# 14. Create tag for release
git tag -a v1.2.0 -m "module release v1.2.0"
git push origin v1.2.0
# 15. Create GitHub release
gh release create v1.2.0 --title "v1.2.0" --notes "Release notes here"
# 16. Create environment in repo via API/gh (example)
gh api repos/:owner/:repo/environments -f name=production
# 17. Set repository secret (GH CLI)
gh secret set AZURE_SUBSCRIPTION_ID --body "SUBSCRIPTION_ID"
# 18. List repo secrets
gh secret list
# 19. Download artifact from workflow run
gh run download --artifact artifact-name --run-id <run-id>
# 20. Cancel workflow run
gh run cancel <run-id>
# 21. View workflow runs
gh run list --workflow iac-ci.yml
# 22. Trigger workflow dispatch (manual)
gh workflow run iac-ci.yml --ref main --field env=staging
# 23. Check branch protection rules (API)
gh api repos/:owner/:repo/branches/main/protection
# 24. Create branch protection rule via API (example)
gh api repos/:owner/:repo/branches/main/protection -f required_status_checks.contexts='["validate"]' -f required_pull_request_reviews.dismiss_stale_reviews=true
# 25. Create and manage GitHub Actions workspaces artifacts (upload)
# In workflow: uses: actions/upload-artifact@v4 with name: what-if-output
Docker commands
Image build, registry, runtime, inspect and maintenance commands used in CI and local development.
# 1. Build an image with tag
docker build -t myapp:local .
# 2. Build with build-arg and target
docker build --build-arg NODE_ENV=production --target runtime -t myapp:prod .
# 3. List local images
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"
# 4. Show image history
docker history myapp:local
# 5. Run container interactively
docker run --rm -it -p 8080:80 --name myapp-run myapp:local
# 6. Run detached with env
docker run -d -p 8080:80 --name myapp -e "ENV=dev" myapp:local
# 7. Run with mounted volume (dev)
docker run --rm -v "$(pwd)":/app -w /app node:18 npm start
# 8. Tag image for registry
docker tag myapp:local myregistry.azurecr.io/myapp:1.0.0
# 9. Login to ACR
az acr login --name myregistry
# or docker login myregistry.azurecr.io
# 10. Push image to registry
docker push myregistry.azurecr.io/myapp:1.0.0
# 11. Pull image from registry
docker pull myregistry.azurecr.io/myapp:1.0.0
# 12. List running containers
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"
# 13. Show container logs (follow)
docker logs -f myapp
# 14. Exec into running container
docker exec -it myapp /bin/sh
# 15. Inspect container
docker inspect myapp
# 16. Remove a container (stopped)
docker rm myapp
# 17. Remove an image
docker rmi myapp:local
# 18. Prune unused images/containers
docker system prune --volumes --force
# 19. Save image to tar
docker save myregistry.azurecr.io/myapp:1.0.0 -o myapp_1.0.0.tar
# 20. Load image from tar
docker load -i myapp_1.0.0.tar
# 21. Build multi-platform image (buildx)
docker buildx build --platform linux/amd64,linux/arm64 -t myregistry.azurecr.io/myapp:1.0.0 --push .
# 22. Create network
docker network create iac-net
# 23. Run container attached to network
docker run -d --network iac-net --name myapp-net myapp:local
# 24. Inspect image labels
docker image inspect myapp:local --format '{{ json .Config.Labels }}'
# 25. Set resource limits on run
docker run -d --memory=512m --cpus=0.5 --name myapp-limited myapp:local
AKS commands (az aks)
AKS cluster lifecycle, node pool, addons, RBAC and maintenance commands via Azure CLI.
# 1. Create AKS cluster (basic)
az aks create -g rg-aks -n aks-prod --node-count 3 --enable-managed-identity --network-plugin azure
# 2. Create cluster with ACR integration
az aks create -g rg-aks -n aks-prod --node-count 3 --attach-acr myregistry --enable-managed-identity
# 3. Get credentials for kubectl (merge)
az aks get-credentials -g rg-aks -n aks-prod --overwrite-existing
# 4. List AKS clusters
az aks list -o table
# 5. Show AKS cluster details
az aks show -g rg-aks -n aks-prod -o json
# 6. Create a node pool
az aks nodepool add --resource-group rg-aks --cluster-name aks-prod --name np-linux --node-count 3 --kubelet-version 1.27.4
# 7. Scale node pool
az aks nodepool scale --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --node-count 5
# 8. Upgrade control plane
az aks upgrade --resource-group rg-aks --name aks-prod --kubernetes-version 1.27.4
# 9. Upgrade a node pool
az aks nodepool upgrade --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --kubernetes-version 1.27.4
# 10. Enable monitoring (Container Insights)
az aks enable-addons -g rg-aks -n aks-prod --addons monitoring --workspace-resource-id /subscriptions/.../resourcegroups/rg-logs/providers/microsoft.operationalinsights/workspaces/la-aks
# 11. Enable AAD integration
az aks update -g rg-aks -n aks-prod --enable-aad --aad-admin-group-object-ids <group-id>
# 12. Enable pod identity (Azure AD workload identity)
az aks update -g rg-aks -n aks-prod --enable-oidc-issuer --enable-managed-identity
# 13. Rotate cluster certificates
az aks rotate-certs --resource-group rg-aks --name aks-prod
# 14. Scale cluster (node count for default pool)
az aks scale -g rg-aks -n aks-prod --node-count 4
# 15. Get node pool list
az aks nodepool list --resource-group rg-aks --cluster-name aks-prod -o table
# 16. Get cluster kubelet versions available
az aks get-upgrades --resource-group rg-aks --name aks-prod -o table
# 17. Disable autoscaler on a node pool
az aks nodepool update --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --disable-cluster-autoscaler
# 18. Enable cluster-autoscaler
az aks nodepool update --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --enable-cluster-autoscaler --min-count 3 --max-count 10
# 19. Get cluster credentials and write to file
az aks get-credentials -g rg-aks -n aks-prod --file ./kubeconfig-aks-prod
# 20. Rotate node image (reimage)
az aks nodepool update --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --node-image-only
# 21. Enable HTTP application routing addon (example)
az aks enable-addons --resource-group rg-aks --name aks-prod --addons http_application_routing
# 22. Create cluster role binding in Azure for admins (example)
az aks update -g rg-aks -n aks-prod --enable-rbac
# 23. Delete node pool
az aks nodepool delete --resource-group rg-aks --cluster-name aks-prod --name np-linux
# 24. Delete AKS cluster
az aks delete -g rg-aks -n aks-prod --yes --no-wait
# 25. AKS diagnostics (support dump)
az aks command invoke -g rg-aks -n aks-prod --command "kubectl get pods --all-namespaces -o wide" --container-name ""
kubectl commands
kubectl commands for inspection, debug, rollout, and port-forwarding in AKS or any Kubernetes cluster.
# 1. Show cluster context and current namespace
kubectl config current-context
kubectl config get-contexts
# 2. List nodes
kubectl get nodes -o wide
# 3. Describe a node
kubectl describe node aks-nodepool1-12345678-vmss000000
# 4. List pods in namespace
kubectl get pods -n prod
# 5. List pods across namespaces
kubectl get pods --all-namespaces
# 6. Describe a pod
kubectl describe pod myapp-abc123 -n prod
# 7. View logs for a pod (single container)
kubectl logs pod/myapp-abc123 -n prod
# 8. Tail logs (follow)
kubectl logs -f deployment/myapp -n prod
# 9. Exec into a pod
kubectl exec -it deployment/myapp -n prod -- /bin/sh
# 10. Apply manifest
kubectl apply -f k8s/deployment.yaml -n prod
# 11. Delete resource
kubectl delete -f k8s/obsolete.yaml -n prod
# 12. Diff local manifest vs cluster
kubectl diff -f k8s/deployment.yaml -n prod
# 13. Scale a deployment
kubectl scale deployment myapp --replicas=5 -n prod
# 14. Rollout status
kubectl rollout status deployment/myapp -n prod
# 15. Rollout undo
kubectl rollout undo deployment/myapp -n prod
# 16. Get service details
kubectl get svc myapp-service -n prod -o yaml
# 17. Port-forward service to local
kubectl port-forward svc/myapp-service 8080:80 -n prod
# 18. Describe ingress
kubectl describe ingress myapp-ingress -n prod
# 19. Apply patch to resource
kubectl patch deployment myapp -n prod --type='json' -p='[{"op":"replace","path":"/spec/replicas","value":3}]'
# 20. Set image of deployment (rolling update)
kubectl set image deployment/myapp myapp=registry.azurecr.io/myapp:1.0.1 -n prod
# 21. Create secret from literal
kubectl create secret generic db-creds --from-literal=username=app --from-literal=password=secret -n prod
# 22. Get events (recent)
kubectl get events -n prod --sort-by='.metadata.creationTimestamp'
# 23. Top nodes and pods (metrics-server required)
kubectl top nodes
kubectl top pods -n prod
# 24. Port-forward to pod for debugging
kubectl port-forward pod/myapp-abc123 3000:3000 -n prod
# 25. Run a one-off job (ephemeral pod)
kubectl run tmp-shell --rm -it --image=alpine -- /bin/sh -n prod
Runbook: recommended CI workflow
Stage 1 - Pull Request validation
- Lint and static checks: run bicep linter, style checks and custom naming validators.
- Build and compilation: run bicep build to generate ARM JSON and fail on build errors.
- What-if preview: run az deployment what-if and summarize resource deltas in PR comment.
- Unit tests and small integration tests: run any available template unit tests locally/CI (mocked resources).
- Policy checks: validate required policy assignments are present and the change does not violate guardrails.
Stage 2 - Staging deploy
- Deploy to a staging resource group using staging parameter file and publish build artifacts.
- Run automated integration and smoke tests (end-to-end flows, dependency checks, private endpoint reachability).
- Collect and store artifacts: what-if output, deployment logs and test artifacts for audit and debugging.
- Run synthetic probes and baseline comparisons (latency, error rates) to ensure performance within SLA-like expectations.
- Notify owners and stakeholders of staging success and include rollback instructions if tests fail.
Stage 3 - Production deploy
- Require manual approval in protected environment; approver must reference change ticket and acknowledge risk.
- Deploy with prod parameter file; use feature flags or slot-based deployments where applicable to reduce impact.
- Run post-deploy validations: smoke tests, health checks, DNS resolution and private endpoint connectivity.
- Trigger monitoring validation: ensure alerts and telemetry ingestion are functioning and no unexplained spikes in errors.
- Execute automated rollback plan if critical failures detected; record outcome and create incident ticket for RCA.