Model Context Protocol for EMR operations

Initial diagnosis for EMR failures, built for AWS DevOps Agent.

Harrier collects bounded evidence from EMR APIs, Spark logs, CloudWatch, S3, and optional Kubernetes diagnostics, then turns the run into a readable triage report instead of another raw log dump.

Harrier Initial Diagnosis Report
Likely path Spark Runtime
Signal SHUFFLE_SPILL
Confidence Medium-High

Initial Triage

  • Infrastructure NOT CHECKED
  • Data NOT CHECKED
  • Spark Runtime ISSUE
    • Driver NOT CHECKED
    • Shuffle ISSUE
    • Executors NOT CHECKED
  • Observability PASS
EMR APIs S3 Logs CloudWatch Spark/YARN Kubernetes GitHub PR Preview

Runtime-aware, one report contract

Different EMR shapes. Same investigation language.

EMR on EC2

Classic clusters, noisy evidence

Harrier maps steps, YARN applications, S3 log layouts, CloudWatch metrics, Spark driver output, and executor signals into one initial triage board.

EMR Serverless

Job runs without cluster guesswork

Serverless application metadata, job run state, monitoring config, S3 logs, CloudWatch logs, and worker sizing evidence stay tied to the same report model.

EMR on EKS

Spark plus pod reality

EMR Containers evidence is combined with optional read-only Kubernetes pod diagnostics so scheduling, image pull, and eviction failures do not hide behind Spark errors.

Readable before it is exhaustive

Harrier separates checked evidence from open questions.

The report is explicit that these are initial checks. Items that were not evaluated are marked NOT CHECKED, while attempted checks with incomplete evidence become INCONCLUSIVE.

View production report screenshots
Area Status Initial read
Infrastructure NOT CHECKED IAM, S3, KMS, bootstrap, cluster capacity
Data NOT CHECKED Input path, schema, bad records, SQL, output path
Spark Runtime ISSUE Shuffle spill found in executor logs
Observability PASS Driver and executor log evidence available
Configuration NOT CHECKED Spark config and sizing need follow-up
Signal

Task reported memory bytes spilled and disk bytes spilled.

Interpretation

Spark shuffle spill is slowing the job or exhausting local disk.

Next check

Inspect failed stages, spill volume, skew, and shuffle partitions.

Built as a headless MCP server

AWS DevOps Agent owns the conversation. Harrier owns the evidence model, collectors, classifier, human diagnosis report, and dry-run recommendation preview.

Explore the demo lab

Operational safety

Designed for investigation, not surprise mutation.

Bounded reads

Harrier reads scoped operational evidence instead of crawling entire buckets or clusters.

Redaction

Common secret patterns are redacted, and log text is treated as untrusted input.

Dry-run first

PR recommendations are advisory unless explicit repository write guardrails are enabled.