cloudconsulting.agustin

Top 25 Recommendations for IaC with Bicep & ARM - Expanded

Each recommendation expanded to 5 tactical items plus 3 concrete examples; portal click steps expanded; full CLI sections added (Docker, AKS, kubectl).

Quick summary

This file expands the prior IaC playbook: each of the 25 recommendations now has five actionable sub-recommendations and three concrete examples (total 8 subitems each).
Portal & GitHub click references have five steps each. Useful commands include expanded PowerShell/CLI plus Docker (10), AKS (10) and kubectl (10) command sets.
The CI workflow runbook (3-stage) has 5 subitems per stage.

Top 25 IaC Recommendations

1. Design modular Bicep modules

  1. Split by domain: network, identity, storage, compute, monitoring (single responsibility).
  2. Keep module scope small so unit tests are feasible and side-effects are minimized.
  3. Provide stable interfaces and avoid leaking internal names/IDs to consumers.
  4. Favor composition: root template composes modules rather than inlining resources.
  5. Publish module documentation and examples adjacent to module code.
  6. Example A: /modules/network.bicep exports vnetId, subnetId, nsgId.
  7. Example B: /modules/storage.bicep outputs storageAccountId, keyVaultRef sample.
  8. Example C: Use module registry: publish network module v1.2.0 and reference by version in root.bicep.

2. Define clear module interfaces (params & outputs)

  1. Use explicit, typed parameters with allowedValues for controlled inputs.
  2. Provide sensible defaults for non-sensitive parameters to reduce repetition.
  3. Export minimal outputs (IDs, connection strings, FQDNs) needed by consumers.
  4. Document each parameter purpose and validation rules in README and in param comments.
  5. Version-breaking changes require new module major version; avoid silent API drift.
  6. Example A: network.bicep param location string { allowed: ['eastus','westeurope'] }.
  7. Example B: storage module outputs accountName and primaryBlobEndpoint used by apps.
  8. Example C: sample parameters file main.prod.parameters.json demonstrating secret references to Key Vault.

3. Use a single canonical root module per environment

  1. Root module composes logical modules and loads parameter files per environment.
  2. Keep environment-specific overrides in parameter files, not in code.
  3. Use CI to select the appropriate parameter file by branch or environment variable.
  4. Keep a lightweight entry-point that is easy to review and vet in PRs.
  5. Reject direct edits to root without CI validation and at least one reviewer familiar with topology.
  6. Example A: main.bicep includes modules/network.bicep and modules/storage.bicep.
  7. Example B: main.dev.parameters.json contains dev-specific size/sku values and diagnostic flags.
  8. Example C: GitHub Actions job passes --parameters @main.prod.parameters.json on production deploy.

4. Keep secrets out of code; use Key Vault + managed identities

  1. Never commit secrets to Git; use Key Vault and references from runtime or pipeline secrets.
  2. Use system-assigned or user-assigned managed identities for deployments and resource access.
  3. Restrict Key Vault access by RBAC and Key Vault firewall; require approval to add new principals.
  4. Use Key Vault references for App Service and MSI-based retrieval for VMs/containers.
  5. Rotate secrets and enable purge protection/soft-delete for Key Vaults containing production secrets.
  6. Example A: use Key Vault reference in app setting: @Microsoft.KeyVault(SecretUri=...)
  7. Example B: GitHub Actions uses azure/login OIDC to get short-lived token and then fetch Key Vault secrets.
  8. Example C: CI pipeline stores non-sensitive config in parameter files and secrets in Key Vault accessed by MSI.

5. Implement strict RBAC scoping for deployment principals

  1. Assign least-privilege roles scoped to the resource group or management group required for deploys.
  2. Prefer role assignments to groups or service principals with just-in-time elevation via PIM.
  3. Audit role assignments and require periodic access reviews for CI principals.
  4. Use Azure AD federated credentials (OIDC) instead of long-lived client secrets where possible.
  5. Block deployment principal from performing destructive operations by separating duties (deploy vs admin).
  6. Example A: SPN assigned "Contributor" only on rg-iac-prod scope, not subscription.
  7. Example B: Use PIM for human elevatable roles and require approval for changes.
  8. Example C: GitHub Actions uses environment protection with required reviewers and OIDC to avoid stored secrets.

6. Use parameter files per environment and secret streams

  1. Keep one canonical parameters file per environment (dev/stage/prod) and validate them in CI.
  2. Do not store secrets in parameter files; reference Key Vault or CI secrets instead.
  3. Validate parameter schemas and allowed values through linting tools or custom policies.
  4. Use parameters for SKUs, locations, tags and feature toggles to avoid code changes.
  5. Secure parameter files in repo by branch protections and require PR reviews for changes to prod files.
  6. Example A: dev.parameters.json with smaller SKUs and feature flags enabled.
  7. Example B: prod.parameters.json references Key Vault secret URIs for connection strings.
  8. Example C: CI pipeline injects secrets via environment secrets and merges runtime parameters before deploy.

7. Validate templates locally before pushing

  1. Run bicep build and bicep linter locally and in CI on every PR.
  2. Run az deployment what-if against a test RG to preview changes early.
  3. Use static analysis to catch invalid property names, wrong API versions, and parameter mismatches.
  4. Automate local checks in pre-commit hooks to surface errors early to the author.
  5. Document test commands in repository README to standardize validation steps for contributors.
  6. Example A: bicep build ./main.bicep && bicep lint ./modules/*
  7. Example B: az deployment group what-if --resource-group rg-test --template-file main.json --parameters @dev.parameters.json
  8. Example C: pre-merge GitHub job that runs build+what-if and posts results as PR comment.

8. Use CI checks for linting, build and what-if

  1. Enforce bicep build, linter and what-if checks as mandatory PR status checks.
  2. Run unit tests for any template logic (where applicable) and small integration deploys to ephemeral RGs.
  3. Fail PRs that introduce parameter schema changes without documentation or migration steps.
  4. Use artifact storage for what-if and build outputs to allow auditors to inspect attempted changes.
  5. Require at least one approver with infra context for production-impacting PRs.
  6. Example A: GitHub Actions workflow: validate job runs bicep build and bicep linter.
  7. Example B: CI posts what-if JSON as workflow artifact and PR comment summarizing delta.
  8. Example C: CI cancels redundant runs with concurrency groups to save resources.

9. Apply policy-as-code and guardrails

  1. Deploy Azure Policy assignments via IaC to enforce tags, allowed SKUs and locations.
  2. Use initiatives to group policies relevant to environments (dev vs prod) and apply at correct scope.
  3. Automate remediation tasks for non-compliant resources where safe (append required tags).
  4. Expose policy evaluation results in central dashboard and alert on non-compliant drift.
  5. Include policy checks in CI to fail deployments that violate guardrails before they reach Azure.
  6. Example A: Bicep module that creates a policy assignment to require cost center tag.
  7. Example B: Policy to deny public IPs in production subscriptions.
  8. Example C: CI pre-deploy policy check job calling Azure Policy REST APIs for expected assignments.

10. Maintain idempotency and declarative expectations

  1. Design templates so repeated deployments converge to same desired state; avoid imperative provisioning in templates.
  2. Use resource locks for critical resources to prevent accidental deletions during automated runs.
  3. Avoid embedding one-off scripts that alter runtime state; prefer a separate data-migration pipeline.
  4. When using deploymentMode 'Complete' be explicit about intended deletions and gate these operations.
  5. Test idempotency by running the same deploy multiple times in sandbox and confirming no diffs.
  6. Example A: Use Incremental mode for day-to-day and Complete only for tear-down pipelines with explicit approvals.
  7. Example B: Bicep templates avoid runCommands or custom script extensions that mutate persisted data.
  8. Example C: Implement resource locks around storage accounts or Key Vaults in production template outputs.

11. Tag and inventory every deployed resource

  1. Require tags (owner, environment, costCenter, lifecycle) via policy and module defaults.
  2. Emit tags as outputs where systems need to reconcile ownership programmatically.
  3. Use Resource Graph queries daily to build an inventory and surface missing tags.
  4. Integrate tag inventory into billing reports and owner notifications for stale resources.
  5. Fail PRs that remove or change required tags without a documented business reason.
  6. Example A: Bicep param tags: object and apply to every resource: tags: tags
  7. Example B: Scheduled job queries Resource Graph and writes CSV of missing tags to storage.
  8. Example C: Alerting rule triggers when resources without "owner" tag are created in prod subscription.

12. Version modules and use semantic versioning

  1. Publish modules with semantic versions and reference exact versions from root templates.
  2. Avoid using 'latest' or branch names for production deployments to prevent unexpected upgrades.
  3. Maintain changelog per module and require release notes when bumping major versions.
  4. Use CI to publish module artifacts to registry and keep tag-to-module mapping in README.
  5. Test new module versions in staging with smoke tests before promoting to prod references.
  6. Example A: Tag module v1.0.0 in repo and publish to GitHub Packages.
  7. Example B: root.bicep references modules/network@1.0.0 rather than a branch ref.
  8. Example C: Release workflow builds module artifact and creates release notes automatically.

13. Use CI/CD with protected environments and approvals

  1. Create protected GitHub Environments for staging and production with required reviewers and secrets.
  2. Require manual approval for production deployments and log approver identity and ticket reference.
  3. Use environment-based secrets scoped to environment rather than repository-wide secrets.
  4. Enforce branch protections: PR reviews, passing checks, and no direct pushes to protected branches.
  5. Implement canary or phased production deployment pipelines to limit blast radius of infra changes.
  6. Example A: GitHub environment "production" requires two approvers and uses OIDC federated token.
  7. Example B: Deploy job requires environment promotion step and posts deployment runbook link to ticket.
  8. Example C: Staging auto-deploy job triggers integration tests and only on success allows manual prod approval.

14. Implement drift detection and scheduled what-if

  1. Schedule regular what-if runs and persist results to artifacts for trend analysis and audit.
  2. Alert on unexpected deltas (new resources, changed SKUs) and assign to owners for remediation.
  3. Use deployment history to track manual changes and require IaC reconciliation or remediation jobs.
  4. Provide dashboards showing percent drift and categorized delta types for prioritization.
  5. Automate remediation where safe (e.g., missing tags) but keep destructive remediation behind approvals.
  6. Example A: Nightly GitHub Action runs az deployment what-if and uploads delta JSON to storage.
  7. Example B: Alert posted to Teams channel if what-if shows > X resource changes in prod RG.
  8. Example C: Runbook to reconcile drift by re-applying canonical parameter file or tagging fixers automatically.

15. Keep state handling transparent (use ARM with care)

  1. Document any stateful operations and keep them out of declarative templates if they require non-idempotent steps.
  2. Use migration pipelines for data changes and schedule them with maintenance windows and backups.
  3. Expose deployment outputs and store them as artifacts for future debugging and rollbacks.
  4. Avoid naming resources using unpredictable or changing tokens that break idempotency across deployments.
  5. Use explicit parameter-driven names for resources that must be stable across reprovisioning.
  6. Example A: Separate "schema migration" job in pipeline that runs after resource creation and is idempotent.
  7. Example B: Export and store deployment outputs artifact with timestamps and version tags.
  8. Example C: Use a naming module that derives names from fixed inputs to prevent accidental renames.

16. Make deployments observable and auditable

  1. Log deployment inputs, outputs and runtime events to a central store for auditing and post-mortem.
  2. Publish what-if and deployment templates as artifacts in CI for later review.
  3. Keep deployment logs (Az CLI output) as job artifacts in GitHub Actions or your CI provider.
  4. Correlate infra changes with application incidents using tags and deployment timestamps.
  5. Implement RBAC and logging on pipeline actors and ensure audit events are retained per policy.
  6. Example A: Action stores what-if JSON and a summary as PR comment and artifact.
  7. Example B: Export Azure Deployment JSON and push to a secure artifact storage for auditors.
  8. Example C: Create dashboard correlating deployment time and number of incidents in the next hour.

17. Use tests: unit (bicep linter), integration (what-if), and smoke

  1. Run bicep linter and syntax checks as part of PR validation.
  2. Use what-if as an integration test to understand resource deltas without committing changes.
  3. Deploy ephemeral stacks to a test RG for end-to-end smoke tests when realistic validation is required.
  4. Automate teardown of ephemeral resources to avoid cost leakage and orphaned resources.
  5. Record test results and use them as gating criteria for promotion to next environment.
  6. Example A: CI job deploys to rg-test, runs smoke tests, then tears down if successful.
  7. Example B: bicep linter run producing a compact report uploaded as Action artifact.
  8. Example C: Integration test asserting storage account accessible via private endpoint from test VM.

18. Keep templates readable and documented

  1. Document module contract in a MODULE_README.md adjacent to module implementation.
  2. Add parameter and output descriptions directly in Bicep using comments for discoverability.
  3. Keep modules small and refer to examples of usage inside the repo for quick onboarding.
  4. Embed CI sample commands in README to lower friction for contributors running local validations.
  5. Require PR descriptions to include a short summary of changes and impact to environments.
  6. Example A: MODULE_README.md with param examples and required permissions.
  7. Example B: root README showing sample CLI deploy command for dev and prod.
  8. Example C: Inline code comments in bicep modules describing rationale for non-obvious choices.

19. Secure CI/CD consoles and enforce MFA / OIDC

  1. Use OIDC to federate GitHub Actions -> Azure to remove stored long-lived secrets.
  2. Lock down CI accounts and enforce MFA and conditional access at org level.
  3. Restrict who can create or modify environment secrets and require approvals for changes.
  4. Rotate service principal credentials when OIDC is not possible and audit use frequently.
  5. Limit Action runner permissions and avoid wide-scoped PATs or tokens in workflows.
  6. Example A: Configure Azure AD federated credential for GitHub Actions OIDC trust.
  7. Example B: Use azure/login@v1 in Actions with OIDC; no client secret stored.
  8. Example C: GitHub org enforces SAML and MFA for all user logins with enforced device policies.

20. Prefer parameterized location and SKU strategies

  1. Parameterize location and SKU values and constrain using allowedValues to prevent mistakes.
  2. Use policy to prevent unsupported SKUs in production subscriptions.
  3. Maintain an authoritative allowed-locations document and keep it versioned near IaC code.
  4. Provide environment-specific SKU maps (dev->small, prod->medium/large) in parameters.
  5. Fail fast: CI validation should reject unsupported combinations before deployment.
  6. Example A: param sku string { allowed: ['Standard_LRS','Premium_LRS'] }
  7. Example B: policy assignment denies unsupported regions for prod subscription.
  8. Example C: param file maps env->sku and CI job selects appropriate mapping by branch.

21. Protect destructive operations with manual gates

  1. Require explicit user input flags for pipeline steps that perform delete/complete deployment modes.
  2. Use environment protection rules and manual approvals for destructive job runs.
  3. Keep destructive operations in separate pipelines or workflows with clear naming and approval flows.
  4. Maintain a soft-delete and backup policy for critical resources before destructive actions proceed.
  5. Record audit trail and ticket references for each manual destructive action taken by the pipeline.
  6. Example A: deploy workflow requires input confirm_delete=true to run delete job.
  7. Example B: manual "destroy" job only accessible to custodian group via environment protection.
  8. Example C: pipeline snapshot of infra state is stored before a destructive change for rollback.

22. Centralize shared state and naming conventions

  1. Keep naming rules in a single module or file consumed by all modules to eliminate divergence.
  2. Validate naming via CI step and fail PRs that deviate from the canonical format.
  3. Use deterministic uniqueString patterns where collisions must be avoided but stability required.
  4. Document naming exceptions and make them explicit via parameter flags, not hidden logic.
  5. Store canonical mapping (prefixes, environment codes) in a central configuration store and version it.
  6. Example A: libs/naming.bicep module that returns standardized resource names.
  7. Example B: CI job runs a script that validates names against naming policy regex.
  8. Example C: central naming map file used to generate sample URIs for docs and dashboards.

23. Automate rollback and provide a recovery plan

  1. Keep last-known-good templates/artifacts and provide a single-click mechanism to redeploy them.
  2. Document and test rollback runbooks as code and store them with IaC artifacts.
  3. Provide a "quick deploy by tag" pipeline to roll back to a specific release artifact.
  4. Automate backups for critical data ahead of infra changes and verify restore procedures periodically.
  5. Include validation tests in rollback pipelines to confirm service restored to expected behavior.
  6. Example A: GitHub action "rollback" job deploys root.bicep from tag v1.2.3.
  7. Example B: automated snapshot of storage before schema migration job runs.
  8. Example C: documented rollback playbook with exact CLI commands and expected validation checks.

24. Use canonical module registry and package modules

  1. Publish modules to a registry and consume by exact version to prevent copy/paste drift.
  2. Implement promotion process: publish to staging registry first, run smoke tests, then promote to prod registry.
  3. Keep module CI that runs unit tests and security scans before publishing artifacts.
  4. Document breaking changes and provide migration guides when publishing major version bumps.
  5. Automate dependency updates in root templates and open PRs for module version bumps with validation jobs.
  6. Example A: publish module to GitHub Packages and reference package version in root.bicep.
  7. Example B: module release pipeline tags repo, builds artifact, publishes and updates registry index.
  8. Example C: automated Dependabot PRs for module minor/patch updates with CI validation.

25. Continuously improve via post-deploy reviews and telemetry

  1. Run a short post-deploy review after changes to capture lessons and required follow-ups.
  2. Collect telemetry on deployment failures and categorize root causes to drive improvements.
  3. Automate issue creation for recurring pipeline flakiness and prioritize fixes as part of sprint work.
  4. Maintain a deployment health dashboard showing success rate, mean time to recovery and incident correlation.
  5. Schedule periodic retros and refresh documentation and runbooks based on recurring pain points.
  6. Example A: CI artifact analysis job extracts common error messages and files tickets automatically.
  7. Example B: dashboard shows deployment success rate and average rollback time for past 90 days.
  8. Example C: monthly learning notes published to team Wiki with action items and owners.

Azure Portal and GitHub: exact clicks reference

Create or connect a GitHub repo

  1. GitHub: Repositories → New → name repo → choose visibility → initialize README; create repo.
  2. GitHub: Settings → Branches → Protection rules → require PR reviews and status checks for main branch.
  3. GitHub: Settings → Environments → create staging/production environments, add secrets and required reviewers.
  4. Azure Portal: Deployment Center → Source → select GitHub → authorize Azure to access repo → choose branch.
  5. Azure Portal: Deployment Center → Configure build provider (GitHub Actions) → confirm workflow template and commit.

Create Key Vault and set access policy

  1. Azure Portal: Create → Key Vault → provide name, region, pricing tier → Create.
  2. Networking tab: choose public access or private endpoint; if private, create/associate a Private Endpoint and private DNS zone.
  3. Access policies / Access control (IAM): grant Get/List for secrets to required managed identities or service principals.
  4. Enable purge protection and soft-delete to protect secrets from accidental or malicious deletion.
  5. Test access: from a VM/managed identity, fetch a secret via CLI or SDK to verify permissions and network path.

Register OIDC trust for GitHub Actions (Federated credential)

  1. Azure AD → App registrations → New registration or select existing app used by CI/CD.
  2. Certificates & secrets → Federated credentials → + Add federated credential → issuer https://token.actions.githubusercontent.com.
  3. Configure subject conditions (repo:org/repo:ref) to scope the trust to specific repo/branch/environment.
  4. Save and verify: use GitHub Actions azure/login with OIDC provider to confirm token exchange works.
  5. Audit: review the federated credential's creation event in Azure AD logs and record CI pipeline mapping in docs.

Create GitHub Environment with required reviewers

  1. GitHub repo → Settings → Environments → New environment → name (production/staging).
  2. Add required reviewers and specify deployment branch protection (only specific branches may deploy).
  3. Add environment secrets (e.g., PROD_SUBSCRIPTION_ID) and restrict secret access to selected workflows.
  4. Set required wait timer or custom review step to ensure cool-off period before production deploys.
  5. Test: trigger a workflow that targets the environment and ensure the approval prompt appears as configured.

Assign scoped RBAC to the CI principal

  1. Azure Portal: Subscription or Resource Group → Access control (IAM) → + Add role assignment.
  2. Select minimal role (Contributor on RG) and pick the CI service principal or federated identity as principal.
  3. Record role assignment justification and ticket reference in CMDB and enforce periodic review.
  4. Test deploy with the principal in a staging RG to validate permissions before granting prod access.
  5. Use Azure AD sign-in logs and access reviews to keep assignments lean and current.

Quick operational notes Practical reminders


Useful PowerShell / Az & CLI snippets

Run these in Cloud Shell, local dev machines, or CI jobs. Replace placeholders before use.

# 1. Sign in (interactive)
Connect-AzAccount

# 2. Sign in (service principal)
az login --service-principal -u APP_ID -p SECRET --tenant TENANT_ID

# 3. Select subscription
Select-AzSubscription -SubscriptionId "SUBSCRIPTION_ID"
az account set --subscription "SUBSCRIPTION_ID"

# 4. Install/upgrade Bicep
az bicep install
az bicep version

# 5. Build Bicep to ARM JSON
bicep build ./main.bicep --outdir ./compiled

# 6. Lint Bicep (requires bicep linter)
bicep --format ./modules/*.bicep
# or using biceps linter in CI (example)

# 7. Validate template (deployment validate)
az deployment group validate --resource-group rg-test --template-file ./main.json --parameters @./parameters/dev.parameters.json

# 8. What-if preview
az deployment group what-if --resource-group rg-test --template-file ./main.json --parameters @./parameters/dev.parameters.json

# 9. Create resource group
az group create --name rg-prod --location eastus

# 10. Deploy Bicep to RG
az deployment group create --resource-group rg-prod --template-file ./main.bicep --parameters @./parameters/prod.parameters.json

# 11. Deploy at subscription scope
az deployment sub create --location eastus --template-file ./main.json --parameters @./parameters/sub.parameters.json

# 12. Create service principal for CI (scoped)
az ad sp create-for-rbac --name "ci-deployer" --role "Contributor" --scopes /subscriptions/SUBSCRIPTION_ID/resourceGroups/rg-deploy

# 13. Create federated credential via az rest (example)
az rest --method POST --uri "https://graph.microsoft.com/v1.0/applications/APP_ID/federatedIdentityCredentials" --body @federated.json

# 14. Export resource group template
az group export --name rg-ase --output json > rg-ase-export.json

# 15. List resource groups
az group list -o table

# 16. Query Resource Graph for missing tags
az graph query -q "Resources | where isnull(tags.owner)" --first 1000

# 17. Add diagnostic setting
az monitor diagnostic-settings create --resource /subscriptions/SUBSCRIPTION_ID/resourceGroups/rg/providers/Microsoft.Web/sites/myapp --workspace /subscriptions/...
/resourcegroups/rg-logs/providers/microsoft.operationalinsights/workspaces/la-aks --name diagSettings --logs '[{"category":"AppServiceHTTPLogs","enabled":true}]' # 18. Get deployment operations az deployment group operation list --resource-group rg-prod --name deploymentName -o table # 19. Lock a resource (prevent deletes) az lock create --name "NoDelete" --resource-group rg-prod --resource-name myResource --resource-type "Microsoft.Storage/storageAccounts" --lock-type CanNotDelete # 20. Remove lock az lock delete --name "NoDelete" --resource-group rg-prod # 21. Set resource tags az resource tag --tags owner=teamA environment=prod --ids /subscriptions/.../resourceGroups/rg-prod/providers/Microsoft.Storage/storageAccounts/mystorage # 22. Show resource details az resource show --ids /subscriptions/.../resourceGroups/rg-prod/providers/Microsoft.Storage/storageAccounts/mystorage # 23. Run ARM template deployment in complete mode (use with caution) az deployment group create --resource-group rg-prod --template-file ./main.bicep --mode Complete --parameters @./parameters/prod.parameters.json # 24. Get deployment what-if delta summary (JSON) az deployment group what-if --resource-group rg-prod --template-file ./main.json --parameters @./parameters/prod.parameters.json --query "changeSummary" -o json # 25. Clean up ephemeral test RG (teardown) az group delete --name rg-test --yes --no-wait

Git / GitHub common commands

Local git + GitHub CLI (gh) commands useful for IaC workflows and release management.

# 1. Clone repo (SSH)
git clone git@github.com:org/repo.git

# 2. Clone repo (HTTPS)
git clone https://github.com/org/repo.git

# 3. Create feature branch
git checkout -b feat/iac-module

# 4. Stage changes
git add .

# 5. Commit changes
git commit -m "add network module"

# 6. Amend commit (if needed)
git commit --amend --no-edit

# 7. Push branch
git push origin feat/iac-module

# 8. Create PR with gh
gh pr create --title "Add network module" --body "This PR adds network.bicep" --base main --head feat/iac-module

# 9. List PRs
gh pr list --state open

# 10. View PR status
gh pr view <pr-number> --web

# 11. Merge PR (squash)
gh pr merge <pr-number> --squash --delete-branch

# 12. Rebase branch onto main
git fetch origin
git rebase origin/main

# 13. Force-push (use carefully)
git push --force-with-lease origin feat/iac-module

# 14. Create tag for release
git tag -a v1.2.0 -m "module release v1.2.0"
git push origin v1.2.0

# 15. Create GitHub release
gh release create v1.2.0 --title "v1.2.0" --notes "Release notes here"

# 16. Create environment in repo via API/gh (example)
gh api repos/:owner/:repo/environments -f name=production

# 17. Set repository secret (GH CLI)
gh secret set AZURE_SUBSCRIPTION_ID --body "SUBSCRIPTION_ID"

# 18. List repo secrets
gh secret list

# 19. Download artifact from workflow run
gh run download --artifact artifact-name --run-id <run-id>

# 20. Cancel workflow run
gh run cancel <run-id>

# 21. View workflow runs
gh run list --workflow iac-ci.yml

# 22. Trigger workflow dispatch (manual)
gh workflow run iac-ci.yml --ref main --field env=staging

# 23. Check branch protection rules (API)
gh api repos/:owner/:repo/branches/main/protection

# 24. Create branch protection rule via API (example)
gh api repos/:owner/:repo/branches/main/protection -f required_status_checks.contexts='["validate"]' -f required_pull_request_reviews.dismiss_stale_reviews=true

# 25. Create and manage GitHub Actions workspaces artifacts (upload)
# In workflow: uses: actions/upload-artifact@v4 with name: what-if-output
        

Docker commands

Image build, registry, runtime, inspect and maintenance commands used in CI and local development.

# 1. Build an image with tag
docker build -t myapp:local .

# 2. Build with build-arg and target
docker build --build-arg NODE_ENV=production --target runtime -t myapp:prod .

# 3. List local images
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"

# 4. Show image history
docker history myapp:local

# 5. Run container interactively
docker run --rm -it -p 8080:80 --name myapp-run myapp:local

# 6. Run detached with env
docker run -d -p 8080:80 --name myapp -e "ENV=dev" myapp:local

# 7. Run with mounted volume (dev)
docker run --rm -v "$(pwd)":/app -w /app node:18 npm start

# 8. Tag image for registry
docker tag myapp:local myregistry.azurecr.io/myapp:1.0.0

# 9. Login to ACR
az acr login --name myregistry
# or docker login myregistry.azurecr.io

# 10. Push image to registry
docker push myregistry.azurecr.io/myapp:1.0.0

# 11. Pull image from registry
docker pull myregistry.azurecr.io/myapp:1.0.0

# 12. List running containers
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"

# 13. Show container logs (follow)
docker logs -f myapp

# 14. Exec into running container
docker exec -it myapp /bin/sh

# 15. Inspect container
docker inspect myapp

# 16. Remove a container (stopped)
docker rm myapp

# 17. Remove an image
docker rmi myapp:local

# 18. Prune unused images/containers
docker system prune --volumes --force

# 19. Save image to tar
docker save myregistry.azurecr.io/myapp:1.0.0 -o myapp_1.0.0.tar

# 20. Load image from tar
docker load -i myapp_1.0.0.tar

# 21. Build multi-platform image (buildx)
docker buildx build --platform linux/amd64,linux/arm64 -t myregistry.azurecr.io/myapp:1.0.0 --push .

# 22. Create network
docker network create iac-net

# 23. Run container attached to network
docker run -d --network iac-net --name myapp-net myapp:local

# 24. Inspect image labels
docker image inspect myapp:local --format '{{ json .Config.Labels }}'

# 25. Set resource limits on run
docker run -d --memory=512m --cpus=0.5 --name myapp-limited myapp:local
        

AKS commands (az aks)

AKS cluster lifecycle, node pool, addons, RBAC and maintenance commands via Azure CLI.

# 1. Create AKS cluster (basic)
az aks create -g rg-aks -n aks-prod --node-count 3 --enable-managed-identity --network-plugin azure

# 2. Create cluster with ACR integration
az aks create -g rg-aks -n aks-prod --node-count 3 --attach-acr myregistry --enable-managed-identity

# 3. Get credentials for kubectl (merge)
az aks get-credentials -g rg-aks -n aks-prod --overwrite-existing

# 4. List AKS clusters
az aks list -o table

# 5. Show AKS cluster details
az aks show -g rg-aks -n aks-prod -o json

# 6. Create a node pool
az aks nodepool add --resource-group rg-aks --cluster-name aks-prod --name np-linux --node-count 3 --kubelet-version 1.27.4

# 7. Scale node pool
az aks nodepool scale --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --node-count 5

# 8. Upgrade control plane
az aks upgrade --resource-group rg-aks --name aks-prod --kubernetes-version 1.27.4

# 9. Upgrade a node pool
az aks nodepool upgrade --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --kubernetes-version 1.27.4

# 10. Enable monitoring (Container Insights)
az aks enable-addons -g rg-aks -n aks-prod --addons monitoring --workspace-resource-id /subscriptions/.../resourcegroups/rg-logs/providers/microsoft.operationalinsights/workspaces/la-aks

# 11. Enable AAD integration
az aks update -g rg-aks -n aks-prod --enable-aad --aad-admin-group-object-ids <group-id>

# 12. Enable pod identity (Azure AD workload identity)
az aks update -g rg-aks -n aks-prod --enable-oidc-issuer --enable-managed-identity

# 13. Rotate cluster certificates
az aks rotate-certs --resource-group rg-aks --name aks-prod

# 14. Scale cluster (node count for default pool)
az aks scale -g rg-aks -n aks-prod --node-count 4

# 15. Get node pool list
az aks nodepool list --resource-group rg-aks --cluster-name aks-prod -o table

# 16. Get cluster kubelet versions available
az aks get-upgrades --resource-group rg-aks --name aks-prod -o table

# 17. Disable autoscaler on a node pool
az aks nodepool update --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --disable-cluster-autoscaler

# 18. Enable cluster-autoscaler
az aks nodepool update --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --enable-cluster-autoscaler --min-count 3 --max-count 10

# 19. Get cluster credentials and write to file
az aks get-credentials -g rg-aks -n aks-prod --file ./kubeconfig-aks-prod

# 20. Rotate node image (reimage)
az aks nodepool update --resource-group rg-aks --cluster-name aks-prod --name nodepool1 --node-image-only

# 21. Enable HTTP application routing addon (example)
az aks enable-addons --resource-group rg-aks --name aks-prod --addons http_application_routing

# 22. Create cluster role binding in Azure for admins (example)
az aks update -g rg-aks -n aks-prod --enable-rbac

# 23. Delete node pool
az aks nodepool delete --resource-group rg-aks --cluster-name aks-prod --name np-linux

# 24. Delete AKS cluster
az aks delete -g rg-aks -n aks-prod --yes --no-wait

# 25. AKS diagnostics (support dump)
az aks command invoke -g rg-aks -n aks-prod --command "kubectl get pods --all-namespaces -o wide" --container-name ""
        

kubectl commands

kubectl commands for inspection, debug, rollout, and port-forwarding in AKS or any Kubernetes cluster.

# 1. Show cluster context and current namespace
kubectl config current-context
kubectl config get-contexts

# 2. List nodes
kubectl get nodes -o wide

# 3. Describe a node
kubectl describe node aks-nodepool1-12345678-vmss000000

# 4. List pods in namespace
kubectl get pods -n prod

# 5. List pods across namespaces
kubectl get pods --all-namespaces

# 6. Describe a pod
kubectl describe pod myapp-abc123 -n prod

# 7. View logs for a pod (single container)
kubectl logs pod/myapp-abc123 -n prod

# 8. Tail logs (follow)
kubectl logs -f deployment/myapp -n prod

# 9. Exec into a pod
kubectl exec -it deployment/myapp -n prod -- /bin/sh

# 10. Apply manifest
kubectl apply -f k8s/deployment.yaml -n prod

# 11. Delete resource
kubectl delete -f k8s/obsolete.yaml -n prod

# 12. Diff local manifest vs cluster
kubectl diff -f k8s/deployment.yaml -n prod

# 13. Scale a deployment
kubectl scale deployment myapp --replicas=5 -n prod

# 14. Rollout status
kubectl rollout status deployment/myapp -n prod

# 15. Rollout undo
kubectl rollout undo deployment/myapp -n prod

# 16. Get service details
kubectl get svc myapp-service -n prod -o yaml

# 17. Port-forward service to local
kubectl port-forward svc/myapp-service 8080:80 -n prod

# 18. Describe ingress
kubectl describe ingress myapp-ingress -n prod

# 19. Apply patch to resource
kubectl patch deployment myapp -n prod --type='json' -p='[{"op":"replace","path":"/spec/replicas","value":3}]'

# 20. Set image of deployment (rolling update)
kubectl set image deployment/myapp myapp=registry.azurecr.io/myapp:1.0.1 -n prod

# 21. Create secret from literal
kubectl create secret generic db-creds --from-literal=username=app --from-literal=password=secret -n prod

# 22. Get events (recent)
kubectl get events -n prod --sort-by='.metadata.creationTimestamp'

# 23. Top nodes and pods (metrics-server required)
kubectl top nodes
kubectl top pods -n prod

# 24. Port-forward to pod for debugging
kubectl port-forward pod/myapp-abc123 3000:3000 -n prod

# 25. Run a one-off job (ephemeral pod)
kubectl run tmp-shell --rm -it --image=alpine -- /bin/sh -n prod
        

Runbook: recommended CI workflow

Stage 1 - Pull Request validation

  1. Lint and static checks: run bicep linter, style checks and custom naming validators.
  2. Build and compilation: run bicep build to generate ARM JSON and fail on build errors.
  3. What-if preview: run az deployment what-if and summarize resource deltas in PR comment.
  4. Unit tests and small integration tests: run any available template unit tests locally/CI (mocked resources).
  5. Policy checks: validate required policy assignments are present and the change does not violate guardrails.

Stage 2 - Staging deploy

  1. Deploy to a staging resource group using staging parameter file and publish build artifacts.
  2. Run automated integration and smoke tests (end-to-end flows, dependency checks, private endpoint reachability).
  3. Collect and store artifacts: what-if output, deployment logs and test artifacts for audit and debugging.
  4. Run synthetic probes and baseline comparisons (latency, error rates) to ensure performance within SLA-like expectations.
  5. Notify owners and stakeholders of staging success and include rollback instructions if tests fail.

Stage 3 - Production deploy

  1. Require manual approval in protected environment; approver must reference change ticket and acknowledge risk.
  2. Deploy with prod parameter file; use feature flags or slot-based deployments where applicable to reduce impact.
  3. Run post-deploy validations: smoke tests, health checks, DNS resolution and private endpoint connectivity.
  4. Trigger monitoring validation: ensure alerts and telemetry ingestion are functioning and no unexplained spikes in errors.
  5. Execute automated rollback plan if critical failures detected; record outcome and create incident ticket for RCA.

 This article was originally published on 2025-NOV-16 and last reviewed on 2025-NOV-17.