Model Context Protocol for EMR operations

Initial diagnosis for EMR failures, built for AWS DevOps Agent.

Harrier collects bounded evidence from EMR APIs, Spark logs, CloudWatch, S3, and optional Kubernetes diagnostics, then turns the run into a readable triage report instead of another raw log dump.

Read the docs View GitHub

Harrier Initial Diagnosis Report

Likely path Spark Runtime

Signal SHUFFLE_SPILL

Confidence Medium-High

Initial Triage

Infrastructure NOT CHECKED
Data NOT CHECKED
Spark Runtime ISSUE
- Driver NOT CHECKED
- Shuffle ISSUE
- Executors NOT CHECKED
Observability PASS

EMR APIs S3 Logs CloudWatch Spark/YARN Kubernetes GitHub PR Preview

Runtime-aware, one report contract

Different EMR shapes. Same investigation language.

EMR on EC2

Classic clusters, noisy evidence

Harrier maps steps, YARN applications, S3 log layouts, CloudWatch metrics, Spark driver output, and executor signals into one initial triage board.

EMR Serverless

Job runs without cluster guesswork

Serverless application metadata, job run state, monitoring config, S3 logs, CloudWatch logs, and worker sizing evidence stay tied to the same report model.

EMR on EKS

Spark plus pod reality

EMR Containers evidence is combined with optional read-only Kubernetes pod diagnostics so scheduling, image pull, and eviction failures do not hide behind Spark errors.

Readable before it is exhaustive

Harrier separates checked evidence from open questions.

The report is explicit that these are initial checks. Items that were not evaluated are marked NOT CHECKED, while attempted checks with incomplete evidence become INCONCLUSIVE.

View production report screenshots

Area	Status	Initial read
Infrastructure	NOT CHECKED	IAM, S3, KMS, bootstrap, cluster capacity
Data	NOT CHECKED	Input path, schema, bad records, SQL, output path
Spark Runtime	ISSUE	Shuffle spill found in executor logs
Observability	PASS	Driver and executor log evidence available
Configuration	NOT CHECKED	Spark config and sizing need follow-up

Signal

Task reported memory bytes spilled and disk bytes spilled.

Interpretation

Spark shuffle spill is slowing the job or exhausting local disk.

Next check

Inspect failed stages, spill volume, skew, and shuffle partitions.

Built as a headless MCP server

AWS DevOps Agent owns the conversation. Harrier owns the evidence model, collectors, classifier, human diagnosis report, and dry-run recommendation preview.

Explore the demo lab

Operational safety

Designed for investigation, not surprise mutation.

Bounded reads

Harrier reads scoped operational evidence instead of crawling entire buckets or clusters.

Redaction

Common secret patterns are redacted, and log text is treated as untrusted input.

Dry-run first

PR recommendations are advisory unless explicit repository write guardrails are enabled.