A mangled comment merged the "Create k3d cluster" step header with
the previous yarn install step, causing a duplicate `run` key that
prevented the entire Integrity workflow from loading.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use the Checks API to flip failed macOS build conclusions to neutral
(gray dash) so unstable builds don't show red X marks on PRs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The monolithic orchestrator-integrity workflow runs 25+ tests sequentially
in a single job, consistently hitting the 60-minute timeout on PR runs.
Split into 4 parallel jobs (k8s, aws-provider, local-docker, rclone) each
on its own runner, cutting wall-clock time from 3+ hours to ~1 hour and
eliminating disk space exhaustion from shared runner contention.
Adopts the parallel architecture from PR #809.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite the monolith orchestrator-integrity.yml (1110 lines, single job,
3+ hour sequential execution) into 4 parallel jobs that run on separate
runners:
- k8s-tests: k3d cluster + LocalStack, 5 tests
- aws-provider-tests: LocalStack only, 10 tests
- local-docker-tests: Docker + LocalStack for S3 tests, 9 tests
- rclone-tests: rclone + LocalStack, 1 test
Key improvements:
- Wall-clock time drops from ~3h to ~1h (longest single job)
- Disk exhaustion eliminated: each job gets its own fresh 14GB runner
- Cleanup logic deduplicated via sourced shell functions instead of
15 copy-pasted 30-line blocks
- K3d node image cleanup only runs in the k8s job (where it matters)
- Light cleanup (cache + docker prune -f) between tests; heavy cleanup
(prune -af --volumes) only at job boundaries
- workflow_call interface unchanged; integrity-check.yml needs no changes
Ref: #794
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- GitHub Actions: max 4-hour polling with clear timeout error including run URL
- GitLab CI: max 4-hour polling with clear timeout error including pipeline URL
- Remote PowerShell: fix credential split to preserve passwords with colons
(split on first colon only instead of all colons)
- Remote PowerShell: throw clear error when credential format is invalid
- Ansible: validate ansible-playbook binary exists in setupWorkflow
(separate from ansible --version check)
- All timeout errors use core.error() for GitHub Actions annotation visibility
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check available disk space (cross-platform: wmic/df) before archive
operations to prevent data loss on full disks. Skip archival with
warning if insufficient space (10% safety margin). Clean up partial
archives on tar failure. Proceed with warning when space check fails.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cap pagination at 100 pages (10,000 runners max), detect GitHub API
rate limiting (403/429) with reset time reporting, add 30-second total
timeout for pagination loop. Log clear diagnostic when no runners found
suggesting possible causes (token permissions, runner registration).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Validate secret key names against alphanumeric allowlist before shell interpolation
- Apply validation in both SecretSourceService.fetchSecret() and legacy queryOverride()
- Mask fetched secret values with core.setSecret() to prevent log exposure
- Add 20 new tests for validation and masking
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevent builds from hanging indefinitely when CLI provider subprocess
is unresponsive. Default 2h for runTaskInWorkflow, 1h for watchWorkflow.
Graceful SIGTERM with 10s grace before SIGKILL.
- Added RUN_TASK_TIMEOUT_MS (2 hours) and WATCH_WORKFLOW_TIMEOUT_MS (1 hour)
- Added gracefulKill helper: SIGTERM first, SIGKILL after 10s grace period
- runTaskInWorkflow and watchWorkflow now have timeout protection
- Existing execute() method upgraded to use gracefulKill
- core.error() called with clear human-readable timeout message
- Added comprehensive tests: timeout triggers, SIGKILL escalation,
grace period cancellation on voluntary exit, normal completion
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement two-level workspace isolation pattern for enterprise-scale CI:
- Atomic O(1) workspace restore via filesystem move (no tar/download/extract)
- Separate Library caching for independent restore
- .git preservation for delta operations
- Stale workspace cleanup with configurable retention policies
- 5 new action inputs: childWorkspacesEnabled, childWorkspaceName,
childWorkspaceCacheRoot, childWorkspacePreserveGit,
childWorkspaceSeparateLibrary
- 28 unit tests covering all service methods
This enables enterprise CI where workspaces are 50GB+ and traditional
caching via actions/cache is impractical. On NTFS, workspace restore
is O(1) via atomic rename when source and destination are on the same volume.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds BuildReliabilityService with the following capabilities:
- checkGitIntegrity(): runs git fsck --no-dangling and parses output for corruption
- cleanStaleLockFiles(): removes stale .lock files older than 10 minutes
- validateSubmoduleBackingStores(): validates .git files point to valid backing stores
- recoverCorruptedRepo(): orchestrates fsck, lock cleanup, re-fetch, retry fsck
- cleanReservedFilenames(): removes Windows reserved filenames (con, prn, aux, nul, com1-9, lpt1-9)
- archiveBuildOutput(): creates tar.gz archive of build output
- enforceRetention(): deletes archives older than retention period
- configureGitEnvironment(): sets GIT_TERMINAL_PROMPT=0, http.postBuffer, core.longpaths
Wired into action.yml as opt-in inputs, with pre-build integrity checks and
post-build archival in the main entry point.
Includes 29 unit tests covering success and failure cases for all methods.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three optional reliability features for hardening CI pipelines:
- Git corruption detection & recovery (fsck, stale lock cleanup,
submodule backing store validation, auto-recovery)
- Reserved filename cleanup (removes Windows device names that
cause Unity asset importer infinite loops)
- Build output archival with configurable retention policy
All features are opt-in and fail gracefully with warnings only.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add four new providers that delegate builds to external CI platforms:
- remote-powershell: Execute on remote machines via WinRM/SSH
- github-actions: Dispatch workflow_dispatch on target repository
- gitlab-ci: Trigger pipeline via GitLab API
- ansible: Run playbooks against managed inventory
Each follows the CI-as-a-provider pattern: trigger remote job,
pass build parameters, stream logs, report status.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Built-in support for Unity Git Hooks (com.frostebite.unitygithooks):
- Auto-detect UPM package in Packages/manifest.json
- Run init-unity-lefthook.js before hook installation
- Set CI-friendly env vars (disable background project mode)
New gitHooksRunBeforeBuild input runs specific lefthook groups before
the Unity build, allowing CI to trigger pre-commit or pre-push checks
that normally only fire on git events.
35 unit tests covering detection, init, CI env, group execution, and
failure handling.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
First-class support for elastic-git-storage as a custom LFS transfer
agent. When lfsTransferAgent is set to "elastic-git-storage" (or
"elastic-git-storage@v1.0.0" for a specific version), the service
automatically finds or installs the agent from GitHub releases, then
configures it via git config.
Supports version pinning via @version suffix in the agent value,
eliminating the need for a separate version parameter. Platform and
architecture detection handles linux/darwin/windows on amd64/arm64.
37 unit tests covering detection, PATH lookup, installation, version
parsing, and configuration delegation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds three Vault entries: hashicorp-vault (KV v2), hashicorp-vault-kv1
(KV v1), and vault (short alias). Uses VAULT_ADDR for server address and
VAULT_MOUNT env var for configurable mount path (defaults to 'secret').
Refs #776
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace token-in-URL pattern with http.extraHeader for git clone and LFS
operations. The token no longer appears in clone URLs, git remote config,
or process command lines.
Add gitAuthMode input (default: 'header', legacy: 'url') so users can
fall back to the old behavior if needed.
Closes#785
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive tests for CLI provider (cleanupWorkflow, garbageCollect,
listWorkflow, watchWorkflow, stderr forwarding, timeout handling), local
cache service (saveLfsCache full path and error handling), git hooks service
(husky install, failure logging, edge cases), and LFS agent service (empty
storagePaths, validate logging). 73 tests across 4 test files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a fast-fail unit test step at the top of orchestrator-integrity,
right after yarn install and before any infrastructure setup (k3d,
LocalStack). Runs 113 mock-based orchestrator tests in ~5 seconds.
If serialization, path computation, log parsing, or provider loading
is broken, the workflow fails immediately instead of spending 30+
minutes setting up LocalStack and k3d clusters.
Tests included: orchestrator-guid, orchestrator-folders,
task-parameter-serializer, follow-log-stream-service,
runner-availability-service, provider-url-parser, provider-loader,
provider-git-manager, orchestrator-image, orchestrator-hooks,
orchestrator-github-checks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds 64 new mock-based unit tests covering orchestrator services that
previously had zero test coverage:
- TaskParameterSerializer: env var format conversion, round-trip,
uniqBy deduplication, blocked params, default secrets
- FollowLogStreamService: build output message parsing — end of
transmission, build success/failure detection, error accumulation,
Library rebuild detection
- OrchestratorNamespace (guid): GUID generation format, platform
name normalization, nanoid uniqueness
- OrchestratorFolders: path computation for all folder getters,
ToLinuxFolder conversion, repo URL generation, purge flag detection
All tests are pure mock-based and run without any external
infrastructure (no LocalStack, K8s, Docker, or AWS).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers: no token skip, no runners fallback, busy/offline runners,
label filtering (case-insensitive), minAvailable threshold,
fail-open on API error, mixed runner states.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds tests for cache hit restore (picks latest tar), LFS cache
restore/save, garbage collection age filtering, and edge cases
like permission errors and empty directories.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds retryOnFallback (retry failed builds on alternate provider) and
providerInitTimeout (swap provider if init takes too long). Refactors
run() into run()/runWithProvider() to support retry loop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds built-in load balancing: check GitHub runner availability before
builds start, auto-route to a fallback provider when runners are busy
or offline. Eliminates the need for a separate check-runner job.
New inputs: fallbackProviderStrategy, runnerCheckEnabled,
runnerCheckLabels, runnerCheckMinAvailable.
Outputs providerFallbackUsed and providerFallbackReason for workflow
visibility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both providers now support four storage backends via gcpStorageType / azureStorageType:
GCP Cloud Run:
- gcs-fuse: Mount GCS bucket as POSIX filesystem (unlimited, best for large sequential I/O)
- gcs-copy: Copy artifacts in/out via gsutil (simpler, no FUSE overhead)
- nfs: Filestore NFS mount (true POSIX, good random I/O, up to 100 TiB)
- in-memory: tmpfs (fastest, volatile, up to 32 GiB)
Azure ACI:
- azure-files: SMB file share mount (up to 100 TiB, premium throughput)
- blob-copy: Copy artifacts in/out via az storage blob (no mount overhead)
- azure-files-nfs: NFS 4.1 file share mount (true POSIX, no SMB lock overhead)
- in-memory: emptyDir tmpfs (fastest, volatile, limited by container memory)
New inputs: gcpStorageType, gcpFilestoreIp, gcpFilestoreShare, azureStorageType,
azureBlobContainer. Constructor validates storage config and warns on missing
prerequisites (e.g. NFS requires VPC connector/subnet).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add two new cloud provider implementations for the orchestrator, both marked
as experimental:
- **GCP Cloud Run Jobs** (`providerStrategy: gcp-cloud-run`): Executes Unity
builds as Cloud Run Jobs with GCS FUSE for large artifact storage. Supports
configurable machine types, service accounts, and VPC connectors. 7 new inputs
(gcpProject, gcpRegion, gcpBucket, gcpMachineType, gcpDiskSizeGb,
gcpServiceAccount, gcpVpcConnector).
- **Azure Container Instances** (`providerStrategy: azure-aci`): Executes Unity
builds as ACI containers with Azure File Shares (Premium FileStorage) for
large artifact storage up to 100 TiB. Supports configurable CPU/memory,
VNet integration, and subscription targeting. 9 new inputs
(azureResourceGroup, azureLocation, azureStorageAccount, azureFileShareName,
azureSubscriptionId, azureCpu, azureMemoryGb, azureDiskSizeGb, azureSubnetId).
Both providers use their respective CLIs (gcloud, az) for infrastructure
management and support garbage collection of old build resources. No tests
included as these require real cloud infrastructure to validate.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add generic enterprise-grade features to the orchestrator, enabling Unity projects with
complex CI/CD pipelines to adopt game-ci/unity-builder with built-in support for:
- CLI provider protocol: JSON-over-stdin/stdout bridge enabling providers in any language
(Go, Python, Rust, shell) via the `providerExecutable` input
- Submodule profiles: YAML-based selective submodule initialization with glob patterns
and variant overlays (`submoduleProfilePath`, `submoduleVariantPath`)
- Local build caching: Filesystem-based Library and LFS caching for local builds without
external cache actions (`localCacheEnabled`, `localCacheRoot`)
- Custom LFS transfer agents: Register external transfer agents like elastic-git-storage
(`lfsTransferAgent`, `lfsTransferAgentArgs`, `lfsStoragePaths`)
- Git hooks support: Detect and install lefthook/husky with configurable skip lists
(`gitHooksEnabled`, `gitHooksSkipList`)
Also removes all `orchestrator-develop` branch references, replacing with `main`.
13 new action inputs, 13 new files, 14 new CLI provider tests, 17 submodule tests,
plus cache/LFS/hooks unit tests. All 452 tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Rename "Cloud Runner" to "Orchestrator" across entire codebase
Breaking change: All CloudRunner classes, options, environment variables,
and action.yml inputs have been renamed to Orchestrator equivalents.
- Renamed src/model/cloud-runner/ directory to src/model/orchestrator/
- Renamed all cloud-runner-* files to orchestrator-*
- Renamed all CloudRunner* classes to Orchestrator* (15+ classes)
- Renamed all cloudRunner* properties to orchestrator* equivalents
- Renamed CLOUD_RUNNER_* env vars to ORCHESTRATOR_*
- Updated action.yml [CloudRunner] markers to [Orchestrator]
- Updated workflow files and package.json test scripts
- Updated all runtime strings (cache paths, log messages, branch refs)
- Rebuilt dist/index.js
No backward compatibility layer is provided.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Remove tracked log/temp files and add to .gitignore
Remove $LOG_FILE and temp/job-log.txt debug artifacts that should
not be in the repository.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>