1. Enterprise Reference Architecture
AI generates artifacts; controlled automation executes them. Keep approvals, policy enforcement, and auditability in the middle.
Engineer / Ops / Security
|
v
AI Assistant (generation only)
|
v
Automation Orchestrator (policy, approvals, templating)
|
+---> Inventory DB / CMDB
+---> Secrets Vault
|
v
Zowe CLI / z/OSMF REST / JES submit
|
v
JES -> IKJEFT01 -> RACDCERT -> RACF DB2. Governance & Controls
- Separation of duties (request / approve / execute)
- Restricted automation IDs (least privilege)
- Policy engine: allow-list RACDCERT verbs + targets
- Audit logging: prompts, generated JCL, approvals, JES output
- Promotion path: DEV → TEST → PROD
3. Full Certificate Inventory Report System
Nightly inventory scans feed dashboards and compliance reports.
3.1 Inventory Scan JCL
//CERTSCAN JOB (ACCT),'CERT SCAN',CLASS=A,MSGCLASS=X,NOTIFY=&SYSUID
//STEP1 EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
RACDCERT CERTAUTH LIST
RACDCERT SITE LIST
RACDCERT ID(*) LIST
/*3.2 Ring Scan JCL
//RINGSCAN JOB (ACCT),'RING SCAN',CLASS=A,MSGCLASS=X,NOTIFY=&SYSUID
//STEP1 EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
RACDCERT ID(*) LISTRING(*)
RACDCERT SITE LISTRING(*)
/*4. End-to-End Automated Renewal Framework
Detect → plan → approve → execute → verify → update inventory → notify.
4.1 Renewal Template (example)
//CERTRENW JOB (ACCT),'CERT RENEW',CLASS=A,MSGCLASS=X,NOTIFY=&SYSUID
//STEP1 EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
RACDCERT ID(&APPID) GENCERT +
SUBJECTSDN(CN('&FQDN') OU('&OU') O('&ORG') C('&C')) +
WITHLABEL('&NEWLABEL') +
KEYUSAGE(HANDSHAKE) +
NOTAFTER(DATE(&YYYY-&MM-&DD))
RACDCERT ID(&APPID) CONNECT(LABEL('&NEWLABEL') RING(&RING) +
USAGE(PERSONAL) DEFAULT)
SETROPTS RACLIST(DIGTCERT) REFRESH
/*5. Large-Bank Patterns (10,000+ certs)
At scale — environments with tens of thousands of digital certificates spanning hundreds of applications, middleware stacks, and network services — ad-hoc certificate management breaks down. The patterns below represent the operational model adopted by large financial institutions running z/OS at enterprise scale. Each pattern addresses a distinct failure mode that emerges only at volume.
5.1 Standard Ring Naming & Ownership Model per Runtime
Without a naming convention, key rings become impossible to audit. A ring namedMYRINGtells an auditor nothing about which application owns it, which environment it belongs to, or which runtime uses it. At 10,000+ certificates the cognitive load becomes unmanageable and compliance evidence collection takes days instead of minutes.
The standard pattern enforces a structured ring name that encodes four dimensions: environment, application ID, runtime type, and purpose. Ownership is pinned to a dedicated, least-privilege RACF user ID that maps 1-to-1 with the application's service account — never a shared or human ID.
Ring name schema
{ENV}.{APPID}.{RUNTIME}.{PURPOSE}
ENV = P (prod) | T (test) | D (dev)
APPID = 6-char application code from CMDB
RUNTIME = ATTLS | CICS | IMS | MQ | ZCON | DB2
PURPOSE = TLS | SIGN | AUTH | ENCRYPT
/* Examples */
P.PAYMT1.ATTLS.TLS /* Prod, Payment app, AT-TLS, TLS handshake */
T.PAYMT1.ATTLS.TLS /* Test equivalent */
P.ORDMGT.CICS.TLS /* Prod, Order Mgmt, CICS, TLS */
P.ORDMGT.MQ.AUTH /* Prod, Order Mgmt, MQ, client auth */
/* Ownership: ring owner = application service ID */
RACDCERT ID(PAYMT1P) ADDRING(P.PAYMT1.ATTLS.TLS)
RACDCERT ID(PAYMT1P) CONNECT(LABEL('P.PAYMT1.TLS.2026') +
RING(P.PAYMT1.ATTLS.TLS) USAGE(PERSONAL) DEFAULT)
RACDCERT CONNECT(CERTAUTH LABEL('INTERNAL ROOT CA') +
RING(P.PAYMT1.ATTLS.TLS) USAGE(CERTAUTH))| Dimension | Enforced by | Benefit |
|---|---|---|
| Ring name schema | Automation orchestrator rejects non-conforming names | Instant owner/env lookup from ring name alone |
| 1-to-1 service ID ownership | RACF RRSF policy + change management gate | Clear accountability; no shared-ID audit gaps |
| CMDB linkage | Ring create workflow writes back to CMDB | Certificates traceable to business application |
| Env prefix | Promotion pipeline enforces P/T/D prefix | Prevents accidental prod cert use in test |
5.2 Internal PKI Integration (Automated CSR Issuance)
Large banks operate their own internal Certificate Authority (CA) hierarchy — typically a two-tier offline Root CA and one or more online Issuing CAs — rather than relying on public CAs for internal workloads. This gives the security team full control over certificate policy, validity periods, key usage extensions, and revocation.
The automation pipeline bridges RACF and the internal PKI without human intervention: RACF generates the CSR on-platform using RACDCERT GENREQ, the orchestrator submits it to the CA's ACME or SCEP endpoint, and the signed certificate is imported back via RACDCERT ADD. No engineer ever touches a private key.
/* Step 1 — Generate CSR on z/OS */
//GENCSRJB JOB (ACCT),'GEN CSR',CLASS=A,MSGCLASS=X
//STEP1 EXEC PGM=IKJEFT01
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
RACDCERT ID(PAYMT1P) GENREQ +
SUBJECTSDN(CN('paymt1.internal.bank.com') +
OU('Payments') O('First National Bank') C('US')) +
WITHLABEL('P.PAYMT1.TLS.2026.CSR') +
KEYUSAGE(HANDSHAKE)
/*
/* Step 2 — Export CSR for submission to internal CA */
RACDCERT EXPORT(LABEL('P.PAYMT1.TLS.2026.CSR')) +
ID(PAYMT1P) DSN('PAYMT1P.CERT.CSR') FORMAT(CERTREQ)
/* Step 3 — (Orchestrator submits CSR to CA via ACME/SCEP) */
/* Step 4 — Import signed certificate back */
RACDCERT ADD('PAYMT1P.CERT.SIGNED') +
ID(PAYMT1P) WITHLABEL('P.PAYMT1.TLS.2026') TRUST
/* Step 5 — Connect to ring and set as DEFAULT */
RACDCERT ID(PAYMT1P) CONNECT(LABEL('P.PAYMT1.TLS.2026') +
RING(P.PAYMT1.ATTLS.TLS) USAGE(PERSONAL) DEFAULT)
SETROPTS RACLIST(DIGTCERT) REFRESHKey design decisions
- Private key never leaves z/OS — GENREQ creates key pair in RACF DB; only the CSR (public material) is exported.
- CA policy enforces 1-year max validity, SHA-256 signature, 2048-bit RSA or P-256 ECDSA minimum.
- Orchestrator records CA transaction ID in inventory DB for full audit trail from CSR to signed cert.
- Failure at any step triggers rollback: old certificate remains DEFAULT until new one is verified.
- SETROPTS RACLIST(DIGTCERT) REFRESH is always the final step — without it, z/OS subsystems see stale data.
5.3 Nightly Inventory Refresh & Weekly Compliance Reporting
A certificate inventory is only useful if it reflects reality. In large environments, certificates are added, removed, and modified by multiple teams across multiple LPARs. A nightly automated scan reconciles the live RACF database against the central inventory, flags drift, and feeds a weekly compliance report distributed to application owners and the security governance team.
/* Nightly inventory job — runs via JES scheduler at 02:00 */
//NIGHTLY JOB (ACCT),'NIGHTLY INV',CLASS=B,MSGCLASS=X
//SCAN EXEC PGM=IKJEFT01,DYNAMNBR=20
//SYSTSPRT DD DSN=CERT.INVENTORY.&&DATE,DISP=(NEW,CATLG),
// SPACE=(CYL,(5,2)),LRECL=255,RECFM=VB
//SYSTSIN DD *
RACDCERT CERTAUTH LIST
RACDCERT SITE LIST
RACDCERT ID(*) LIST
RACDCERT ID(*) LISTRING(*)
RACDCERT SITE LISTRING(*)
/*
//PARSE EXEC PGM=IKJEFT01
//SYSTSIN DD *
/* Orchestrator parses output, upserts inventory DB, */
/* computes expiry deltas, flags drift vs prior night */
/*| Report | Frequency | Audience | Key metrics |
|---|---|---|---|
| Expiry horizon | Daily | App owners | Certs expiring in 30 / 60 / 90 days |
| Drift report | Daily | Security ops | Certs added/removed since last scan |
| Compliance summary | Weekly | CISO / governance | Policy violations, unconnected certs, missing CA anchors |
| Full inventory export | Weekly | Audit / GRC | All certs, owners, rings, expiry, trust status |
| LPAR delta report | Weekly | Platform team | Cross-LPAR cert discrepancies via RRSF |
Compliance checks in the weekly report
- Expiring ≤90 days and not in renewal pipeline → escalation required.
- Certificates connected to no ring → orphan; candidate for deletion after owner confirmation.
- Rings with no CERTAUTH entry → TLS handshake will fail if peer presents a chain; flag immediately.
- Labels or DNs that deviate from naming standard → policy violation; remediation ticket raised.
- Self-signed certs in production rings → exception required or replace with CA-signed.
- Certificates with MD5 or SHA-1 signatures → cryptographic hygiene violation; mandatory replacement.
5.4 Auto-Ticket if <60 Days and Not in Renewal Pipeline
Certificate expiry outages are entirely preventable — they happen because no one acted on the warning. The auto-ticketing pattern closes that gap by making inaction operationally impossible: if a certificate crosses the 60-day threshold and the inventory database shows no active renewal work item, the orchestrator automatically raises a ticket in the ITSM platform (ServiceNow, Jira, or equivalent), assigns it to the registered application owner, and begins escalation if the ticket is not acknowledged within the SLA window.
/* Pseudo-logic executed nightly by the orchestrator */
FOR EACH cert IN inventory WHERE expiry_days <= 60:
IF cert.renewal_ticket_id IS NULL
OR cert.renewal_status NOT IN ('IN_PROGRESS', 'APPROVED', 'COMPLETED'):
ticket = ITSM.create({
title: 'CERT EXPIRY: ' + cert.label + ' expires in ' + expiry_days + ' days',
priority: expiry_days <= 14 ? 'P1-Critical' :
expiry_days <= 30 ? 'P2-High' : 'P3-Medium',
assignee: CMDB.lookup_owner(cert.app_id),
details: cert.ring + ' / ' + cert.label + ' / ' + cert.subject_dn,
due_date: cert.expiry_date - 7 days
})
inventory.update(cert.id, { renewal_ticket_id: ticket.id })
NOTIFY(cert.app_owner, cert.security_contact, ticket)
ELSE IF ticket.age_hours > SLA_HOURS AND ticket.status == 'OPEN':
ESCALATE(ticket, manager_of(cert.app_owner))| Days to expiry | Priority | SLA to acknowledge | Escalation path |
|---|---|---|---|
| 31–60 days | P3 — Medium | 5 business days | App owner → security ops |
| 15–30 days | P2 — High | 24 hours | App owner → manager → CISO |
| 1–14 days | P1 — Critical | 4 hours | War-room: app owner + platform + security |
| Expired | P1 — Incident | Immediate | Incident bridge; emergency renewal SOP |
What counts as "in the renewal pipeline"?
- An open ITSM ticket with status IN_PROGRESS, APPROVED, or PENDING_DEPLOYMENT.
- A new certificate with the same CN/SAN already connected to the ring (cutover pending).
- An approved change record referencing the certificate label in the CMDB.
- A CSR already submitted to the internal CA and awaiting signing (tracked by CA transaction ID in inventory).
5.5 Using AI to Automate RACF Certificate Management
AI does not replace the RACF security engineer — it eliminates the repetitive, error-prone work that consumes most of their time: translating a certificate request into correct JCL, parsing thousands of lines of RACDCERT LIST output into structured data, spotting anomalies in a 10,000-row inventory, and drafting runbooks from incident post-mortems. The architecture principle is firm: AI generates; humans (or policy-gated automation) approve and execute. No AI component ever holds RACF credentials or submits JCL directly.
| Capability | AI role | Tools / models | Human gate |
|---|---|---|---|
| JCL generation | Draft RACDCERT JCL from natural-language request | GPT-4o / Claude 3.5 via prompt template | Engineer reviews diff before submission |
| Inventory parsing | Extract structured fields from raw RACDCERT LIST output | LLM extraction or regex + LLM fallback | Automated; anomalies flagged for review |
| Expiry forecasting | Predict renewal backlog and resource demand 90 days out | Time-series model on inventory DB | Report reviewed by security ops weekly |
| Anomaly detection | Flag unusual cert attributes, unexpected ring changes | Embedding similarity + rule engine | Alert routed to security analyst |
| Runbook generation | Draft SOP from incident description or change request | GPT-4o with RACF context in system prompt | Security architect approves before publishing |
| CSR field validation | Check DN, SAN, key usage against policy before submission | LLM + JSON schema validator | Automated block; override requires approval |
| Root-cause analysis | Diagnose handshake failures from JES output + SMF records | RAG over RACF knowledge base | Engineer confirms diagnosis before remediation |
5.5.1 AI-Driven JCL Generation
The most immediate AI win is turning a plain-English certificate request into syntactically correct, policy-compliant RACDCERT JCL. An engineer describes what they need; the LLM produces the JCL; the orchestrator validates it against the allow-list before a human approves submission. This eliminates the most common source of certificate incidents: hand-crafted JCL with typos in label names, missingSETROPTS RACLIST(DIGTCERT) REFRESH, or incorrect USAGE flags.
/* System prompt fragment injected into every LLM request */
You are a z/OS RACF security automation assistant.
Rules you MUST follow:
1. Only emit RACDCERT, SETROPTS, and RRSF verbs.
2. Always end JCL with SETROPTS RACLIST(DIGTCERT) REFRESH.
3. Ring names must match schema {ENV}.{APPID}.{RUNTIME}.{PURPOSE}.
4. NOTAFTER date must not exceed 365 days from today.
5. KEYUSAGE must be HANDSHAKE for TLS, DATAENCRYPT for encryption.
6. Never emit DELETE, EXPORT, or IMPORT without explicit user confirmation.
7. Output ONLY valid JCL — no prose, no markdown fences.
/* Example user prompt */
"Create a prod AT-TLS certificate for the PAYMT1 service,
CN=paymt1.internal.bank.com, OU=Payments, valid 1 year,
connect to ring P.PAYMT1.ATTLS.TLS as DEFAULT."
/* LLM output (reviewed by engineer before execution) */
//GENCRTJB JOB (ACCT),'AI GEN CERT',CLASS=A,MSGCLASS=X
//STEP1 EXEC PGM=IKJEFT01
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD *
RACDCERT ID(PAYMT1P) GENCERT +
SUBJECTSDN(CN('paymt1.internal.bank.com') +
OU('Payments') O('First National Bank') C('US')) +
WITHLABEL('P.PAYMT1.TLS.2026') +
KEYUSAGE(HANDSHAKE) +
NOTAFTER(DATE(2027-03-05))
RACDCERT ID(PAYMT1P) CONNECT(LABEL('P.PAYMT1.TLS.2026') +
RING(P.PAYMT1.ATTLS.TLS) USAGE(PERSONAL) DEFAULT)
SETROPTS RACLIST(DIGTCERT) REFRESH
/*5.5.2 AI-Assisted Inventory Parsing & Structuring
Raw RACDCERT LIST output is free-form text — not JSON, not CSV. Parsing it with brittle regex breaks whenever IBM changes the output format across z/OS releases. An LLM extraction layer handles format variation gracefully: it reads the raw SYSTSPRT output and emits a structured JSON record per certificate, which is then upserted into the inventory database.
/* Orchestrator calls LLM extraction API with raw RACDCERT output */
PROMPT:
Extract all certificates from the following RACDCERT LIST output.
Return a JSON array. Each object must have:
label, owner_id, subject_dn, issuer_dn, not_before,
not_after, key_type, key_size, serial, trust_status,
connected_rings (array of ring names)
If a field is absent, use null.
Output ONLY valid JSON — no prose.
INPUT: <raw SYSTSPRT spool output>
/* LLM returns */
[
{
"label": "P.PAYMT1.TLS.2026",
"owner_id": "PAYMT1P",
"subject_dn": "CN=paymt1.internal.bank.com,OU=Payments,O=First National Bank,C=US",
"issuer_dn": "CN=Internal Issuing CA 1,O=First National Bank,C=US",
"not_before": "2026-03-05",
"not_after": "2027-03-05",
"key_type": "RSA",
"key_size": 2048,
"serial": "04A3F2",
"trust_status": "TRUST",
"connected_rings": ["P.PAYMT1.ATTLS.TLS"]
}
]5.5.3 Anomaly Detection & Security Alerting
At 10,000+ certificates, no human reviews every nightly diff. An AI anomaly layer compares each night's inventory snapshot against the prior baseline and flags deviations that warrant investigation. Embedding-based similarity catches semantic anomalies (e.g., a label that looks almost like a legitimate one but differs by one character — a classic insider-threat indicator). Rule-based checks catch structural violations. Both feed a unified alert queue.
Anomaly signals the AI layer monitors
- New certificate added outside of a change window or without a matching ITSM ticket.
- Certificate connected to a ring it has never appeared in before (lateral movement indicator).
- Validity period longer than policy maximum (365 days) — may indicate a manual bypass.
- Subject DN deviates from the naming standard for its application ID — possible misconfiguration or spoofing.
- Self-signed certificate appearing in a production ring that previously held only CA-signed certs.
- TRUST status change on a CERTAUTH entry without a corresponding change record.
- Spike in RACDCERT DELETE operations by a single user ID within a short window.
- Certificate label with high edit-distance similarity to an existing label (typosquatting).
/* Nightly anomaly detection pseudo-logic */
current = inventory.snapshot(tonight)
baseline = inventory.snapshot(last_night)
delta = diff(current, baseline)
FOR EACH change IN delta:
/* Rule-based checks */
IF change.type == 'ADD' AND NOT itsm.has_open_ticket(change.label):
alert(severity='HIGH', msg='Unauthorised cert addition: ' + change.label)
IF change.cert.not_after - today > 365:
alert(severity='MEDIUM', msg='Validity exceeds policy: ' + change.label)
IF change.cert.is_self_signed AND change.ring.env == 'P':
alert(severity='HIGH', msg='Self-signed cert in prod ring: ' + change.ring)
/* Embedding-based similarity check */
FOR EACH existing_label IN inventory.all_labels():
IF edit_distance(change.label, existing_label) IN [1, 2]:
alert(severity='MEDIUM',
msg='Near-duplicate label detected: ' + change.label +
' vs ' + existing_label)5.5.4 RAG-Powered Runbook & Diagnostic Assistant
Retrieval-Augmented Generation (RAG) grounds the LLM in your organisation's own knowledge: SOPs, past incident post-mortems, IBM documentation, and this handbook. Engineers query it in natural language during an incident and receive step-by-step guidance drawn from verified internal sources — not hallucinated from the model's training data. The knowledge base is updated automatically whenever a new SOP is approved or a post-mortem is closed.
RAG knowledge base sources
IBM z/OS RACF documentation
Chunked by command; embedded at ingest
Internal SOPs (SOP-1 through SOP-N)
Auto-ingested on approval in ITSM
Incident post-mortems
Root cause + resolution indexed
CMDB application registry
Owner, ring, and cert metadata
Change records (approved)
Provides context for what changed when
This handbook
Versioned; re-indexed on each publish
/* Example RAG query during an AT-TLS handshake failure incident */
Engineer: "AT-TLS handshake failing for PAYMT1 in prod.
JES output shows IRR912I. What do I check first?"
Assistant (grounded in internal knowledge base):
Based on SOP-7 (AT-TLS Diagnostic) and post-mortem INC0042381:
1. Verify the certificate is connected as DEFAULT in the ring:
RACDCERT ID(PAYMT1P) LISTRING(P.PAYMT1.ATTLS.TLS)
2. Check the CERTAUTH anchor is present in the ring:
RACDCERT ID(PAYMT1P) LISTRING(P.PAYMT1.ATTLS.TLS)
Look for USAGE(CERTAUTH) entry matching the peer's issuer.
3. Confirm SETROPTS RACLIST(DIGTCERT) REFRESH was run after
the last certificate change (INC0042381 root cause: this
step was skipped during the 2025-11-14 renewal).
4. Check certificate expiry:
RACDCERT ID(PAYMT1P) LIST(LABEL('P.PAYMT1.TLS.2026'))
5. If all above pass, collect SMF type 82 records and open
a ticket per SOP-7 section 4.3.5.5.5 AI Governance Guardrails
Every AI component in the pipeline operates within a strict governance envelope. The guardrails are not optional enhancements — they are the architectural precondition that makes AI use acceptable to audit, risk, and compliance functions.
| Guardrail | Implementation | Rationale |
|---|---|---|
| No direct RACF access | AI API calls are network-isolated from z/OS; only the orchestrator holds credentials | Prevents AI model compromise from becoming a RACF breach |
| Verb allow-list | Orchestrator rejects any JCL containing verbs outside an approved set | Stops prompt injection from escalating to destructive operations |
| Output schema validation | All LLM outputs parsed against JSON schema before use; malformed output is discarded | Prevents hallucinated fields from corrupting inventory DB |
| Prompt audit log | Every prompt, model response, and approval decision is logged immutably | Full audit trail for regulatory examination |
| Human approval gate | No JCL reaches JES without an explicit approve action by a named engineer | Maintains separation of duties; AI is advisory only |
| Model version pinning | Production pipeline pins to a specific model version; updates go through change management | Prevents silent behaviour changes from model updates |
| Data residency | LLM API calls contain no actual private key material; only labels, DNs, and metadata | Sensitive cryptographic material never leaves z/OS |
6. Application Playbooks
AT-TLS
RACDCERT ID(TCPSRV) ADDRING(ATTLSRING)
RACDCERT ID(TCPSRV) CONNECT(LABEL('SERVER TLS CERT') +
RING(ATTLSRING) USAGE(PERSONAL) DEFAULT)
RACDCERT CONNECT(CERTAUTH LABEL('ROOT CA') +
RING(ATTLSRING) USAGE(CERTAUTH))
RACDCERT ID(TCPSRV) LISTRING(ATTLSRING)CICS
RACDCERT ID(CICSUSR) ADDRING(CICSRING)
RACDCERT ID(CICSUSR) CONNECT(LABEL('CICS TLS CERT') +
RING(CICSRING) USAGE(PERSONAL) DEFAULT)
RACDCERT ID(CICSUSR) LISTRING(CICSRING)z/OS Connect
RACDCERT ID(ZCONUSR) ADDRING(ZCONRING)
RACDCERT ID(ZCONUSR) CONNECT(LABEL('ZCON TLS CERT') +
RING(ZCONRING) USAGE(PERSONAL) DEFAULT)
RACDCERT ID(ZCONUSR) LISTRING(ZCONRING)7. Troubleshooting
Standard evidence checklist + quick diagnostics.
RACDCERT ID(APPID) LISTRING(RINGNAME)
RACDCERT ID(APPID) LIST(LABEL('CERT LABEL'))
RACDCERT CERTAUTH LIST
RACDCERT SITE LIST8. RACDCERT Encyclopedia (curated)
GENCERT / DELETE / LIST / LISTRINGCore certificate lifecycle commands
ADDRING / DELRING / CONNECT / REMOVEKey ring management operations
EXPORT / IMPORT (PKCS12, CERTB64)Certificate format interchange
GENREQ (CSR)Certificate signing request generation
TRUST / CERTAUTH managementTrust chain and CA anchor control
9. SOP Runbooks
Self-signed issuance
Generate and connect a self-signed certificate to a key ring without an external CA.
CA-signed issuance
Generate a CSR, submit to internal PKI, import the signed certificate, and connect it.
Renewal and cutover
Generate a new certificate, connect as DEFAULT, verify, remove the old certificate from the ring.
10. Appendix
Recommended weekly compliance report: expiring 30/60/90 days, unconnected certs, rings missing CA anchors, non-standard labels/DNs.
| Check | Threshold | Action |
|---|---|---|
| Expiring certs | 30 / 60 / 90 days | Auto-ticket + renewal pipeline |
| Unconnected certs | Any | Review and connect or delete |
| Rings missing CA anchors | Any | Add CERTAUTH entry |
| Non-standard labels/DNs | Policy deviation | Flag for remediation |
Generated 2026-03-05 • Static site (index.html + styles.css)
11. AI Tool Evaluation Matrix
Choosing an AI platform for RACF certificate automation is not purely a technology decision — it is a risk and compliance decision. The platform must satisfy data residency requirements, integrate with the z/OS execution layer (Zowe), and close the ticket loop with the ITSM platform (ServiceNow). The matrix below evaluates the four leading enterprise AI platforms against the criteria that matter in a regulated financial institution.
| Criterion | IBM watsonx | Azure OpenAI | AWS Bedrock | Self-hosted (Ollama/vLLM) |
|---|---|---|---|---|
| On-prem deployment | Yes — Cloud Pak for Data | Limited — Arc-enabled | No (cloud only) | Yes — full on-prem |
| Data residency | EU/US region isolation | Sovereign cloud options | Region-locked endpoints | Complete — no egress |
| z/OS / Zowe integration | Native via IBM Z AI | Via Zowe REST + Azure Logic Apps | Via Zowe REST + Lambda | Via Zowe REST + local orchestrator |
| ServiceNow integration | MID Server + Flow Designer | ServiceNow Spoke for Azure | ServiceNow Spoke for AWS | Custom REST integration |
| RACF fine-tuning support | Yes — Granite models | Yes — GPT-4o fine-tune | Yes — Titan/Llama fine-tune | Yes — LoRA on any model |
| Audit logging | IBM OpenPages integration | Azure Monitor + Sentinel | CloudTrail + Security Hub | Local log pipeline required |
| Latency (interactive JCL review) | ~1–2 s (on-prem) | ~0.5–1.5 s | ~0.8–2 s | ~0.3–1 s (GPU) |
| Regulatory certifications | SOC2, ISO27001, FedRAMP | SOC2, ISO27001, FedRAMP High | SOC2, ISO27001, FedRAMP High | Depends on infrastructure |
| Cost model | Enterprise licence | Per-token + infra | Per-token + infra | CapEx (GPU hardware) |
11.1 Zowe as the AI-to-z/OS Execution Bridge
Zowe is the open-source framework that gives modern tooling — including AI orchestrators running off-platform — a secure, REST-based interface to z/OS. In the AI automation architecture, Zowe plays a single critical role: it is the only component that touches z/OS. The AI model never calls Zowe directly; the policy-gated orchestrator does, and only after a human has approved the generated JCL.
Zowe exposes three interfaces relevant to certificate automation: the Zowe CLI for scripted JCL submission and spool retrieval, the z/OSMF REST API for programmatic job management, and the Zowe Explorer VS Code extension for engineer-facing interactive review before approval.
# ── Zowe CLI: full AI-to-z/OS automation flow ──────────────────────
# 1. Orchestrator writes AI-generated JCL to a temp dataset
zowe files upload file-to-data-set ./ai_generated.jcl "PAYMT1P.AI.GENCERT.JCL" --data-set-type SEQ
# 2. Engineer reviews JCL in Zowe Explorer (VS Code) — APPROVAL STEP
# zowe files download data-set "PAYMT1P.AI.GENCERT.JCL"
# 3. After approval, orchestrator submits the job
zowe jobs submit data-set "PAYMT1P.AI.GENCERT.JCL" --view-all-spool-content
# 4. Retrieve JES output for inventory parsing by AI
zowe jobs list jobs --owner PAYMT1P --prefix GENCRTJB
zowe jobs view spool-file-by-id JOB12345 2 > spool_output.txt
# 5. Pipe spool output to LLM extraction for inventory update
curl -s -X POST $LLM_API_URL -H "Authorization: Bearer $LLM_API_KEY" -d "{"prompt": "Extract cert fields from: $(cat spool_output.txt)"}" | jq '.result' >> inventory_db_upsert.json
# 6. Verify certificate is now in RACF
zowe tso issue command "RACDCERT ID(PAYMT1P) LIST(LABEL('P.PAYMT1.TLS.2026'))"Key Zowe components used
zowe filesUpload AI-generated JCL to z/OS datasets; download spool for review
zowe jobs submitSubmit approved JCL to JES after human gate
zowe jobs view spoolRetrieve RACDCERT LIST output for AI parsing
zowe tso issueRun interactive RACDCERT commands for verification
zowe configTeam profiles store z/OSMF endpoint and credential refs (not secrets)
Zowe Explorer (VS Code)Engineer approval UI — view diff of AI JCL before submission
11.2 ServiceNow as the AI Approval & Ticket Orchestration Layer
ServiceNow is the governance backbone of the AI pipeline. Every AI-generated JCL artefact is attached to a ServiceNow Change Request before it can be submitted to z/OS. The approval workflow, SLA timers, escalation paths, and audit trail all live in ServiceNow — not in a custom database. This means the AI automation is immediately visible to risk, audit, and compliance teams through their existing tooling, with no new dashboards to build.
# ── ServiceNow REST API: AI pipeline integration ───────────────────
# 1. Orchestrator creates a Change Request with AI-generated JCL attached
curl -X POST https://INSTANCE.service-now.com/api/now/table/change_request -H "Authorization: Bearer $SN_TOKEN" -H "Content-Type: application/json" -d '{
"short_description": "AI-generated cert renewal: P.PAYMT1.TLS.2026",
"description": "RACDCERT GENCERT + CONNECT for PAYMT1P. Expiry: 2027-03-05.",
"assignment_group": "RACF Security Engineering",
"category": "Certificate Management",
"cmdb_ci": "PAYMT1 Payment Service",
"u_ai_generated": "true",
"u_jcl_content": "<base64-encoded JCL>",
"u_zowe_dataset": "PAYMT1P.AI.GENCERT.JCL"
}'
# 2. Orchestrator polls for approval state
curl -X GET "https://INSTANCE.service-now.com/api/now/table/change_request?sysparm_query=number=CHG0012345" -H "Authorization: Bearer $SN_TOKEN" | jq '.result[0].state' # "approved" → proceed; else wait/abort
# 3. On approval: orchestrator calls Zowe to submit JCL (see 11.1)
# On rejection: orchestrator updates ticket and notifies engineer
# 4. After execution: update Change Request with outcome
curl -X PATCH "https://INSTANCE.service-now.com/api/now/table/change_request/SYS_ID" -H "Authorization: Bearer $SN_TOKEN" -d '{
"state": "closed_successful",
"close_notes": "JCL executed. JES job JOB12345 completed RC=0. Cert verified.",
"u_jes_job_id": "JOB12345",
"u_cert_expiry": "2027-03-05"
}'
# 5. Auto-ticket for expiry alert (from nightly orchestrator)
curl -X POST https://INSTANCE.service-now.com/api/now/table/incident -H "Authorization: Bearer $SN_TOKEN" -d '{
"short_description": "CERT EXPIRY WARNING: P.PAYMT1.TLS.2026 expires in 45 days",
"urgency": "2",
"impact": "2",
"assignment_group": "RACF Security Engineering",
"u_cert_label": "P.PAYMT1.TLS.2026",
"u_expiry_date": "2026-04-19",
"u_ring": "P.PAYMT1.ATTLS.TLS"
}'| ServiceNow capability | Role in AI pipeline | Table / API |
|---|---|---|
| Change Request | Human approval gate for every AI-generated JCL artefact | change_request |
| Incident | Auto-raised for expiry alerts and anomaly detections | incident |
| CMDB CI | Links certificate to owning application and service | cmdb_ci_appl |
| Flow Designer | Orchestrates multi-step approval → Zowe submit → verify workflow | sys_hub_flow |
| MID Server | On-premises agent that relays Zowe CLI calls from ServiceNow to z/OS network | ecc_agent |
| Audit Log | Immutable record of every AI prompt, approval, and execution event | sys_audit |
| Assignment Rules | Auto-route tickets to correct RACF team based on ring name prefix | auto_assignment |
End-to-end flow: AI request → ServiceNow → Zowe → z/OS
- 1.Engineer submits natural-language request to AI assistant (chat UI or ServiceNow Virtual Agent).
- 2.AI generates RACDCERT JCL and posts it as a draft Change Request in ServiceNow with full context.
- 3.ServiceNow Flow Designer notifies the assigned RACF engineer; JCL is visible in the CR work notes.
- 4.Engineer reviews JCL in Zowe Explorer (VS Code) via the linked dataset, approves or rejects the CR.
- 5.On approval: ServiceNow Flow Designer triggers MID Server → Zowe CLI → JES job submission.
- 6.Zowe retrieves JES spool output; AI parses it and updates the inventory database.
- 7.ServiceNow CR is closed with job ID, RC, and verified cert expiry date attached.
- 8.If anomaly detected post-execution, ServiceNow Incident is auto-raised and linked to the CR.
12. AI-Assisted Diagnostic Decision Tree
The decision tree below maps the most common RACF certificate failure codes to their root causes and remediation steps. In the AI-assisted model, an engineer pastes the error code and JES output into the RAG assistant (Section 5.5.4); the assistant traverses this logic and returns the relevant branch with organisation-specific context from the knowledge base. Zowe CLI commands are provided at each step for direct execution; ServiceNow actions are noted where a ticket must be raised or updated.
| Error code | Subsystem | Plain-English meaning | Most common cause |
|---|---|---|---|
IRR912I | AT-TLS / RACF | Certificate not trusted or not found in ring | Missing CERTAUTH anchor or SETROPTS REFRESH not run |
IRRC0072E | RACF | Certificate label not found for user ID | Label mismatch or cert deleted without ring cleanup |
ICH408I | RACF | User lacks authority to perform RACDCERT operation | Service ID missing RACF profile permission |
IRRC0049E | RACF | Certificate already exists with this label | Duplicate label; renewal created before old cert removed |
IRRC0065E | RACF | Ring does not exist | Ring name typo or ring deleted; orchestrator naming mismatch |
SSL0222E | z/OS TLS stack | Handshake failed — peer certificate not trusted | CA anchor missing from ring or expired intermediate CA |
GSKSRVR | GSKit | GSKit SSL error during TLS negotiation | Certificate expired or key type mismatch (RSA vs ECDSA) |
12.1 IRR912I — Certificate Not Trusted / Not Found in Ring
/* Step 1: Identify the ring and owner from the failing application log */
zowe tso issue command "RACDCERT ID(PAYMT1P) LISTRING(P.PAYMT1.ATTLS.TLS)"
/* Look for: PERSONAL cert connected as DEFAULT, CERTAUTH anchor present */
/* Step 2: Check the certificate itself */
zowe tso issue command "RACDCERT ID(PAYMT1P) LIST(LABEL('P.PAYMT1.TLS.2026'))"
/* Verify: TRUST status = TRUST, NOTAFTER date is in the future */
/* Step 3: If CERTAUTH anchor is missing — add it */
zowe tso issue command "RACDCERT CONNECT(CERTAUTH LABEL('INTERNAL ROOT CA') +
RING(P.PAYMT1.ATTLS.TLS) USAGE(CERTAUTH))"
/* Step 4: ALWAYS run refresh after any ring change */
zowe tso issue command "SETROPTS RACLIST(DIGTCERT) REFRESH"
/* Step 5: Verify fix — rerun LISTRING and confirm CERTAUTH entry present */
zowe tso issue command "RACDCERT ID(PAYMT1P) LISTRING(P.PAYMT1.ATTLS.TLS)"
/* ServiceNow: Update the incident work notes with steps taken and outcome */
# curl -X PATCH .../incident/SYS_ID -d '{"work_notes": "Added CERTAUTH anchor. REFRESH run. Verified."}'
# If root cause was a missed REFRESH after a prior change, link to that CR.12.2 IRRC0072E — Certificate Label Not Found
/* Step 1: Confirm the label exists for the user */
zowe tso issue command "RACDCERT ID(PAYMT1P) LIST"
/* If label is absent: cert was deleted or never created */
/* Step 2: Check if label exists under a different owner */
zowe tso issue command "RACDCERT CERTAUTH LIST"
zowe tso issue command "RACDCERT SITE LIST"
/* Label may have been created under CERTAUTH or SITE instead of the app ID */
/* Step 3: Check ring for orphaned connections */
zowe tso issue command "RACDCERT ID(PAYMT1P) LISTRING(P.PAYMT1.ATTLS.TLS)"
/* If ring references a label that no longer exists: REMOVE the stale entry */
zowe tso issue command "RACDCERT ID(PAYMT1P) REMOVE(LABEL('OLD.LABEL') RING(P.PAYMT1.ATTLS.TLS))"
/* Step 4: Re-run the renewal SOP (SOP-2 or SOP-3) to recreate the certificate */
/* ServiceNow: If cert was deleted outside of a change window, raise a Security Incident */
# Attach RACDCERT LIST output as evidence; assign to RACF Security Engineering12.3 ICH408I — Insufficient Authority
/* Step 1: Identify which RACF profile is blocking the operation */
/* ICH408I message includes the profile name and class */
/* Example: ICH408I USER(AUTOBOT) RACDCERT CLASS(DIGTCERT) */
/* Step 2: Check current permits on the automation ID */
zowe tso issue command "RLIST DIGTCERT ** AUTHUSER"
/* Or for a specific profile: */
zowe tso issue command "RLIST DIGTCERT 'PAYMT1P.**' AUTHUSER"
/* Step 3: Grant minimum required access (least privilege) */
zowe tso issue command "PERMIT 'PAYMT1P.**' CLASS(DIGTCERT) ID(AUTOBOT) ACCESS(UPDATE)"
zowe tso issue command "SETROPTS RACLIST(DIGTCERT) REFRESH"
/* Step 4: Re-test the failing operation */
/* ServiceNow: Raise a Change Request for the PERMIT command */
/* Automation IDs must never receive access outside of a reviewed CR */
# Attach RLIST output (before) and PERMIT command as evidence12.4 AI + Zowe + ServiceNow: Integrated Diagnostic Flow
In the fully integrated model, an engineer does not need to know which branch of the decision tree to follow. They paste the error into the RAG assistant; the assistant retrieves the relevant SOP branch, generates the Zowe CLI diagnostic commands, and optionally drafts the ServiceNow work note. The engineer executes and approves each step.
# ── Integrated diagnostic session example ──────────────────────────
# Engineer input to RAG assistant:
"AT-TLS handshake failing for PAYMT1 in prod. JES shows IRR912I.
ServiceNow incident INC0089234 is open. What do I do?"
# RAG assistant response (grounded in internal knowledge base):
Based on SOP-7 and post-mortem INC0042381, IRR912I in AT-TLS
typically means a missing CERTAUTH anchor or a stale RACF cache.
Run these Zowe CLI commands in order:
zowe tso issue command "RACDCERT ID(PAYMT1P) LISTRING(P.PAYMT1.ATTLS.TLS)"
→ If CERTAUTH anchor is absent, run:
zowe tso issue command "RACDCERT CONNECT(CERTAUTH LABEL('INTERNAL ROOT CA') +
RING(P.PAYMT1.ATTLS.TLS) USAGE(CERTAUTH))"
zowe tso issue command "SETROPTS RACLIST(DIGTCERT) REFRESH"
Once resolved, update ServiceNow INC0089234 with:
curl -X PATCH .../incident/INC0089234 -d '{"state": "resolved", "close_notes": "Added CERTAUTH anchor.
SETROPTS REFRESH run. Handshake verified at 14:32 UTC."}'
If the anchor was already present, collect SMF type 82 records
and escalate per SOP-7 section 4.3.13. Prompt Library
The prompts below are production-tested templates. Each is designed for a specific task in the RACF automation pipeline and includes a system prompt (injected by the orchestrator) and a user prompt template (filled in by the engineer or automation). All prompts enforce the governance constraints from Section 5.5.5: no credentials, no destructive verbs without confirmation, output schema validation before use. Zowe CLI commands show how to feed real z/OS data into each prompt; ServiceNow fields show where the output is recorded.
Certificate Renewal JCL Generator
Use when: engineer requests a renewal for an existing certificate. Output: complete RACDCERT JCL ready for review in Zowe Explorer.
/* SYSTEM PROMPT — P-01 */
You are a z/OS RACF JCL generation assistant operating inside a
policy-gated automation pipeline.
Constraints:
- Output ONLY valid z/OS JCL. No prose, no markdown, no comments outside /* */.
- Permitted verbs: RACDCERT GENCERT, CONNECT, REMOVE, SETROPTS RACLIST(DIGTCERT) REFRESH.
- NOTAFTER must not exceed today + 365 days.
- Ring name must match: {ENV}.{APPID}.{RUNTIME}.{PURPOSE}.
- Always include SETROPTS RACLIST(DIGTCERT) REFRESH as the final SYSTSIN statement.
- Do NOT include DELETE or EXPORT unless the user prompt explicitly says "confirmed delete".
/* USER PROMPT TEMPLATE — P-01 */
Generate a certificate renewal JCL for:
Owner ID: {{OWNER_ID}}
Label: {{NEW_LABEL}}
CN: {{FQDN}}
OU: {{OU}}
O: {{ORG}}
C: {{COUNTRY}}
Ring: {{RING_NAME}}
Valid until: {{NOTAFTER_DATE}}
Key usage: {{KEYUSAGE}}
Replace old label: {{OLD_LABEL}} (connect new as DEFAULT, leave old connected)
/* HOW TO FEED REAL DATA — Zowe CLI */
# Get current cert details to populate the template:
zowe tso issue command "RACDCERT ID(PAYMT1P) LIST(LABEL('P.PAYMT1.TLS.2025'))" > current_cert.txt
# Pipe to orchestrator which fills the template and calls the LLM API
/* SERVICENOW — where output goes */
# Orchestrator creates Change Request:
# short_description: "AI Renewal JCL: {{NEW_LABEL}}"
# u_jcl_content: <base64 JCL>
# u_zowe_dataset: "{{OWNER_ID}}.AI.RENEW.JCL"
# state: "pending_approval"Inventory Extraction from RACDCERT LIST Output
Use when: nightly scan spool output needs to be parsed into the inventory database. Output: JSON array of certificate records.
/* SYSTEM PROMPT — P-02 */
You are a structured data extraction assistant for z/OS RACF output.
Extract all certificate records from the RACDCERT LIST spool output provided.
Output a JSON array. Each object must contain exactly these fields
(use null for absent fields — do not omit fields):
label, owner_id, subject_dn, issuer_dn, not_before, not_after,
key_type, key_size, serial_number, trust_status, is_self_signed,
connected_rings (string array)
Output ONLY valid JSON. No prose, no markdown fences, no trailing commas.
/* USER PROMPT TEMPLATE — P-02 */
Extract all certificates from the following RACDCERT LIST output:
<SPOOL_OUTPUT>
{{SPOOL_TEXT}}
</SPOOL_OUTPUT>
/* HOW TO FEED REAL DATA — Zowe CLI */
# Retrieve nightly scan spool:
zowe jobs list jobs --owner CERTSCAN --prefix CERTSCAN | head -1
zowe jobs view spool-file-by-id JOB99001 2 > spool_output.txt
# Call LLM extraction API:
curl -s -X POST $LLM_API_URL -H "Authorization: Bearer $LLM_API_KEY" -d "{"system": "<P-02 system prompt>",
"user": "<P-02 user prompt with spool_output.txt content>"}" | jq '.result' > extracted_certs.json
/* SERVICENOW — where output goes */
# Orchestrator upserts extracted_certs.json into inventory DB.
# Any cert with not_after < today+60 and no open ticket:
# → auto-raise Incident via ServiceNow REST API (see Section 11.2)Root-Cause Analysis from JES Output
Use when: engineer pastes an error code and JES output during an incident. Output: structured diagnosis with Zowe CLI remediation steps and ServiceNow work note draft.
/* SYSTEM PROMPT — P-03 */
You are a z/OS RACF diagnostic assistant with deep knowledge of
RACF certificate management, AT-TLS, CICS, MQ, and z/OS Connect.
When given an error code and JES/spool output:
1. Identify the root cause from the error code and context.
2. List the exact Zowe CLI commands to diagnose and remediate, in order.
3. State whether a SETROPTS RACLIST(DIGTCERT) REFRESH is required.
4. Draft a ServiceNow work note (2-3 sentences, plain English).
5. Flag if a Security Incident should be raised (e.g., unauthorised change).
Format your response as JSON with keys:
root_cause, zowe_commands (array), refresh_required (bool),
snow_work_note, raise_security_incident (bool), confidence (high|medium|low)
/* USER PROMPT TEMPLATE — P-03 */
Error code: {{ERROR_CODE}}
Application: {{APP_ID}}
Ring: {{RING_NAME}}
ServiceNow incident: {{INC_NUMBER}}
JES output:
<JES_OUTPUT>
{{JES_TEXT}}
</JES_OUTPUT>
/* HOW TO FEED REAL DATA — Zowe CLI */
# Get JES output for the failing job:
zowe jobs view spool-file-by-id JOB12345 2 > jes_output.txt
/* SERVICENOW — where output goes */
# Orchestrator posts snow_work_note to the incident:
# curl -X PATCH .../incident/{{INC_NUMBER}} # -d '{"work_notes": "<snow_work_note from LLM>"}'
# If raise_security_incident == true:
# Orchestrator creates a linked Security Incident with P1 priorityCSR Field Policy Validator
Use when: engineer submits a certificate request. Output: pass/fail against policy with specific violation details before JCL is generated.
/* SYSTEM PROMPT — P-04 */
You are a certificate policy validation assistant for a regulated bank.
Validate the submitted certificate request fields against these rules:
- CN must be a fully qualified domain name ending in .internal.bank.com
- OU must match an entry in the approved OU list (provided below)
- O must be exactly "First National Bank"
- C must be "US"
- NOTAFTER must not exceed 365 days from today ({{TODAY}})
- KEYUSAGE must be HANDSHAKE for TLS, DATAENCRYPT for encryption certs
- Ring name must match {ENV}.{APPID}.{RUNTIME}.{PURPOSE} schema
- APPID must exist in the CMDB application registry (list provided below)
Approved OUs: {{APPROVED_OU_LIST}}
CMDB app IDs: {{CMDB_APPID_LIST}}
Return JSON: { passed: bool, violations: [{ field, value, rule, message }] }
/* USER PROMPT TEMPLATE — P-04 */
Validate this certificate request:
Owner ID: {{OWNER_ID}}
CN: {{CN}}
OU: {{OU}}
O: {{ORG}}
C: {{COUNTRY}}
NOTAFTER: {{NOTAFTER_DATE}}
KEYUSAGE: {{KEYUSAGE}}
Ring: {{RING_NAME}}
/* HOW TO FEED REAL DATA — Zowe CLI */
# Fetch live CMDB app IDs for injection into system prompt:
zowe files download data-set "SECURITY.CMDB.APPIDS" > cmdb_appids.txt
/* SERVICENOW — where output goes */
# If passed == false: orchestrator blocks CR creation and returns
# violations to the engineer in the Virtual Agent chat.
# If passed == true: orchestrator proceeds to P-01 JCL generation
# and creates the Change Request.SOP / Runbook Drafter from Incident Post-Mortem
Use when: incident is resolved and a new or updated SOP is needed. Output: structured SOP draft for security architect review before publishing to the knowledge base.
/* SYSTEM PROMPT — P-05 */
You are a technical writer specialising in z/OS RACF security operations.
Given an incident post-mortem, draft a Standard Operating Procedure (SOP)
that prevents recurrence.
The SOP must follow this structure:
Title, SOP-ID, Version, Last Updated, Applies To
1. Trigger conditions
2. Pre-requisites (access, tools: Zowe CLI version, ServiceNow access)
3. Step-by-step procedure (each step: action, Zowe CLI command, expected output)
4. Verification steps
5. Rollback procedure
6. ServiceNow: how to update the incident/change record at each stage
7. Escalation path if procedure fails
Output plain text in the above structure. No markdown.
/* USER PROMPT TEMPLATE — P-05 */
Draft a new SOP from the following incident post-mortem:
Incident: {{INC_NUMBER}}
Title: {{INC_TITLE}}
Root cause: {{ROOT_CAUSE}}
Resolution steps taken: {{RESOLUTION_STEPS}}
Time to resolve: {{TTR}}
Prevention recommendation: {{PREVENTION}}
/* HOW TO FEED REAL DATA — ServiceNow */
# Fetch post-mortem fields from closed incident:
# curl -X GET ".../incident?number={{INC_NUMBER}}&fields=close_notes,work_notes"
# Pipe close_notes + work_notes into the user prompt template
/* SERVICENOW — where output goes */
# Drafted SOP is attached to the incident as a Knowledge Article draft.
# Security architect reviews and approves via ServiceNow Knowledge workflow.
# On approval: SOP is published to the knowledge base AND
# ingested into the RAG vector store (see Section 5.5.4).Prompt library governance
- All prompts are version-controlled in the same repository as the orchestrator code — changes go through code review.
- System prompts are injected server-side by the orchestrator; engineers cannot modify them at runtime.
- Each prompt has a designated owner (RACF Security Engineering) who reviews it quarterly or after any model version change.
- Prompt outputs are validated against a JSON schema before use; malformed or out-of-schema outputs are discarded and logged.
- Every prompt invocation is logged with: prompt ID, model version, input hash, output hash, approver, and ServiceNow CR/INC reference.
14. RRSF & Multi-LPAR Certificate Replication
In a large-bank z/OS environment, certificates and key rings defined on one LPAR must be consistently replicated to every other LPAR in the sysplex. The RACF Remote Sharing Facility (RRSF) is the mechanism that propagates RACF commands — including RACDCERT operations — across LPARs automatically, without requiring engineers to log on to each system individually. Getting RRSF right is critical: a certificate that exists on LPAR1 but not LPAR2 will cause intermittent TLS handshake failures that are notoriously difficult to diagnose.
14.1 RRSF Topology and Node Configuration
RRSF operates in a peer-to-peer topology. Each LPAR is an RRSF node. Nodes are defined in the RACF database and communicate over TCP/IP. There are two replication modes: AUTODIRECT (commands issued on one node are automatically replicated to all defined peer nodes) and DIRECT (engineer explicitly targets a specific node). For certificate management at scale, AUTODIRECT is the standard — it ensures that every RACDCERT operation is applied sysplex-wide without manual intervention.
/* ── Define RRSF nodes (run once on each LPAR, by security admin) ── */
/* On LPAR1 (SYSA): define peer node SYSB */
RACDEF RRSF.NODE.SYSB.TCPIP CLASS(FACILITY)
PERMIT RRSF.NODE.SYSB.TCPIP CLASS(FACILITY) ID(RACFUSER) ACCESS(READ)
/* Activate RRSF with AUTODIRECT for certificate operations */
RRSF AUTODIRECT(YES) VERIFYUSER(YES)
/* Verify node status */
RRSF LIST
/* Expected output:
NODE: SYSB STATUS: CONNECTED MODE: AUTODIRECT
NODE: SYSC STATUS: CONNECTED MODE: AUTODIRECT */
/* ── Zowe CLI: check RRSF node status from pipeline ── */
zowe tso issue command "RRSF LIST"
/* Parse output to confirm all nodes are CONNECTED before
submitting any multi-LPAR certificate change */14.2 AUTODIRECT Certificate Operations
When AUTODIRECT is active, a RACDCERT command issued on any node is automatically replicated to all peer nodes. However, the replication is asynchronous — the command completes locally first, then propagates. The pipeline must verify replication on each target LPAR before declaring the operation complete. A SETROPTS RACLIST(DIGTCERT) REFRESH must also be issued on each LPAR after replication to activate the new certificate in memory.
/* ── Standard multi-LPAR cert renewal flow with AUTODIRECT ── */
/* Step 1: Issue RACDCERT on primary LPAR (SYSA) — AUTODIRECT replicates */
zowe tso issue command "RACDCERT ID(PAYMT1P) GENCERT +
SUBJECTSDN(CN('paymt1.internal.bank.com') O('First National Bank') C('US')) +
WITHLABEL('P.PAYMT1.TLS.2027') +
NOTAFTER(DATE(2027/03/05)) +
KEYUSAGE(HANDSHAKE) +
SIGNWITH(CERTAUTH LABEL('INTERNAL ROOT CA'))"
/* Step 2: Wait for RRSF propagation (typically 2–10 seconds) */
/* Pipeline: poll with 15-second retry, max 3 attempts */
/* Step 3: Verify cert exists on each LPAR */
zowe tso issue command "RACDCERT ID(PAYMT1P) LIST(LABEL('P.PAYMT1.TLS.2027'))" --zosmf-host SYSB-ZOSMF.bank.com
zowe tso issue command "RACDCERT ID(PAYMT1P) LIST(LABEL('P.PAYMT1.TLS.2027'))" --zosmf-host SYSC-ZOSMF.bank.com
/* Each Zowe profile targets a different LPAR's z/OSMF endpoint */
/* Step 4: Run SETROPTS REFRESH on EVERY LPAR */
for LPAR in SYSA SYSB SYSC; do
zowe tso issue command "SETROPTS RACLIST(DIGTCERT) REFRESH" --zosmf-host ${LPAR}-ZOSMF.bank.com
done
/* Step 5: Verify ring connection on each LPAR */
for LPAR in SYSA SYSB SYSC; do
zowe tso issue command "RACDCERT ID(PAYMT1P) LISTRING(P.PAYMT1.ATTLS.TLS)" --zosmf-host ${LPAR}-ZOSMF.bank.com
done14.3 Consistency Verification Checks
The nightly inventory scan (Section 5.3) must run against every LPAR independently and compare results. Any label present on SYSA but absent on SYSB is a replication gap that must be investigated and remediated before the next business day. The AI inventory parser (Prompt P-02) can be run per-LPAR and the results diffed programmatically.
# ── Nightly multi-LPAR consistency check ──────────────────────────
for LPAR in SYSA SYSB SYSC; do
# Retrieve spool from each LPAR's CERTSCAN job
zowe jobs view spool-file-by-id $(zowe jobs list jobs --owner CERTSCAN --prefix CERTSCAN --zosmf-host ${LPAR}-ZOSMF.bank.com | head -1 | awk '{print $1}') 2 --zosmf-host ${LPAR}-ZOSMF.bank.com > inventory_${LPAR}.txt
# Run P-02 LLM extraction per LPAR
curl -s -X POST $LLM_API_URL -d "{"prompt": "<P-02 system prompt>", "input": "$(cat inventory_${LPAR}.txt)"}" | jq '.result' > certs_${LPAR}.json
done
# Diff: find labels on SYSA not present on SYSB or SYSC
python3 - <<'EOF'
import json
with open('certs_SYSA.json') as f: sysa = {c['label'] for c in json.load(f)}
with open('certs_SYSB.json') as f: sysb = {c['label'] for c in json.load(f)}
with open('certs_SYSC.json') as f: sysc = {c['label'] for c in json.load(f)}
gaps = sysa - sysb | sysa - sysc
if gaps:
print("REPLICATION GAPS:", gaps) # → trigger ServiceNow incident
EOF
# ServiceNow: raise Incident for each replication gap
# curl -X POST .../incident -d '{
# "short_description": "RRSF replication gap: <label> missing on <LPAR>",
# "urgency": "2", "impact": "2",
# "assignment_group": "RACF Security Engineering"
# }'14.4 Multi-LPAR Changes in ServiceNow
A certificate change that spans multiple LPARs must be represented as a single ServiceNow Change Request with child tasks for each LPAR. This gives the change manager a single approval gate while giving the RACF team per-LPAR execution tracking and rollback granularity.
| ServiceNow record | Purpose | Key fields |
|---|---|---|
| Parent Change Request | Single approval gate for the entire multi-LPAR operation | u_lpar_scope: SYSA,SYSB,SYSC; u_ai_generated: true |
| Child Task: SYSA | Execution tracking for primary LPAR | u_lpar: SYSA; u_jes_job_id; u_rc; state |
| Child Task: SYSB | Execution tracking for secondary LPAR | u_lpar: SYSB; u_rrsf_verified: true/false |
| Child Task: SYSC | Execution tracking for tertiary LPAR | u_lpar: SYSC; u_rrsf_verified: true/false |
| Consistency Check Task | Post-execution diff result | u_gaps_found: 0; u_diff_output (attached) |
15. Secrets Vault Integration
The AI automation pipeline requires credentials to call the Zowe CLI (z/OSMF username and password or client certificate), the ServiceNow REST API (OAuth token), and the LLM API (API key). None of these secrets may be stored in pipeline configuration files, environment variable files, or source code repositories. A dedicated secrets vault — HashiCorp Vault or CyberArk Conjur — is the only approved storage location. The vault integrates with both Zowe and ServiceNow so that credentials are retrieved at runtime, used once, and never written to disk.
15.1 HashiCorp Vault Integration
HashiCorp Vault is widely deployed in financial institutions as the enterprise secrets manager. The orchestrator authenticates to Vault using its own AppRole (a machine identity), retrieves the z/OSMF and ServiceNow credentials, uses them for the current pipeline run, and discards them. Vault's dynamic secrets feature can generate short-lived z/OSMF credentials that expire automatically after the job completes.
# ── HashiCorp Vault: orchestrator secrets retrieval flow ──────────
# 1. Orchestrator authenticates to Vault using AppRole
VAULT_TOKEN=$(curl -s -X POST $VAULT_ADDR/v1/auth/approle/login -d "{"role_id": "$APPROLE_ROLE_ID",
"secret_id": "$APPROLE_SECRET_ID"}" | jq -r '.auth.client_token')
# 2. Retrieve z/OSMF credentials (short-lived, 1-hour TTL)
ZOSMF_CREDS=$(curl -s -H "X-Vault-Token: $VAULT_TOKEN" $VAULT_ADDR/v1/secret/data/racf-pipeline/zosmf)
ZOSMF_USER=$(echo $ZOSMF_CREDS | jq -r '.data.data.username')
ZOSMF_PASS=$(echo $ZOSMF_CREDS | jq -r '.data.data.password')
# 3. Retrieve ServiceNow OAuth token
SN_TOKEN=$(curl -s -H "X-Vault-Token: $VAULT_TOKEN" $VAULT_ADDR/v1/secret/data/racf-pipeline/servicenow | jq -r '.data.data.oauth_token')
# 4. Retrieve LLM API key
LLM_KEY=$(curl -s -H "X-Vault-Token: $VAULT_TOKEN" $VAULT_ADDR/v1/secret/data/racf-pipeline/llm | jq -r '.data.data.api_key')
# 5. Configure Zowe profile using retrieved credentials (in-memory only)
zowe config set profiles.racf-prod.properties.user "$ZOSMF_USER" --global-config
zowe config set profiles.racf-prod.properties.password "$ZOSMF_PASS" --global-config
# 6. Run the pipeline (Zowe CLI, ServiceNow REST calls, LLM API)
# ... pipeline steps ...
# 7. Revoke the Vault token after pipeline completes (zero standing access)
curl -s -X POST -H "X-Vault-Token: $VAULT_TOKEN" $VAULT_ADDR/v1/auth/token/revoke-self
# 8. Clear Zowe in-memory credentials
zowe config delete profiles.racf-prod.properties.password --global-configVault secret paths used by the pipeline
secret/data/racf-pipeline/zosmfz/OSMF username + password per LPAR
secret/data/racf-pipeline/servicenowServiceNow OAuth token (rotated daily)
secret/data/racf-pipeline/llmLLM API key (rotated monthly)
secret/data/racf-pipeline/pki-caInternal CA credentials for CSR signing
pki/issue/racf-automationDynamic x.509 cert for pipeline mTLS auth
15.2 CyberArk Conjur Integration
CyberArk is the dominant Privileged Access Management (PAM) platform in large banks. Conjur is CyberArk's secrets manager for DevOps and automation workloads. The integration pattern is the same as Vault: the orchestrator authenticates with a machine identity (Conjur Host), retrieves credentials, uses them, and discards them. CyberArk's Central Policy Manager (CPM) rotates z/OSMF passwords automatically on a schedule, so the pipeline always retrieves the current value rather than relying on a static credential.
# ── CyberArk Conjur: orchestrator secrets retrieval ───────────────
# 1. Authenticate orchestrator host identity to Conjur
CONJUR_TOKEN=$(curl -s -X POST "$CONJUR_URL/authn/$CONJUR_ACCOUNT/$CONJUR_HOST_ID/authenticate" --data-binary "$CONJUR_API_KEY" | base64 | tr -d '
')
# 2. Retrieve z/OSMF password (CPM-managed, auto-rotated)
ZOSMF_PASS=$(curl -s -H "Authorization: Token token="$CONJUR_TOKEN"" "$CONJUR_URL/secrets/$CONJUR_ACCOUNT/variable/racf-pipeline%2Fzosmf%2Fpassword")
# 3. Retrieve ServiceNow OAuth token
SN_TOKEN=$(curl -s -H "Authorization: Token token="$CONJUR_TOKEN"" "$CONJUR_URL/secrets/$CONJUR_ACCOUNT/variable/racf-pipeline%2Fsn%2Foauth-token")
# 4. Retrieve LLM API key
LLM_KEY=$(curl -s -H "Authorization: Token token="$CONJUR_TOKEN"" "$CONJUR_URL/secrets/$CONJUR_ACCOUNT/variable/racf-pipeline%2Fllm%2Fapi-key")
# 5. CyberArk also integrates with ServiceNow via the CyberArk App for ServiceNow:
# ServiceNow Flow Designer retrieves credentials from CyberArk at runtime
# using the CyberArk MID Server plugin — no secrets stored in ServiceNow.
# 6. Zowe team profile references CyberArk credential provider:
# zowe.config.json:
# {
# "profiles": {
# "racf-prod": {
# "properties": {
# "host": "SYSA-ZOSMF.bank.com",
# "port": 443
# },
# "secure": ["user", "password"] ← resolved from CyberArk at runtime
# }
# }
# }15.3 Vault / CyberArk Integration with ServiceNow
ServiceNow itself must not store z/OSMF credentials. When ServiceNow Flow Designer triggers a Zowe CLI call via the MID Server, the MID Server retrieves the credential from the vault at the moment of execution. The ServiceNow Connection Alias stores only the vault path, not the secret value.
/* ── ServiceNow: vault-backed credential flow ── */
/* 1. ServiceNow Connection Alias (no secrets stored) */
{
"name": "RACF_ZOSMF_SYSA",
"type": "HTTP",
"host": "SYSA-ZOSMF.bank.com",
"port": 443,
"credential": "CyberArk_RACF_ZOSMF" /* ← references CyberArk, not a password */
}
/* 2. CyberArk App for ServiceNow resolves the credential at runtime:
ServiceNow Flow Designer → CyberArk MID Server plugin
→ CyberArk Vault → returns current z/OSMF password
→ MID Server uses it for the Zowe CLI call
→ password never written to ServiceNow database */
/* 3. Vault audit log entry (HashiCorp Vault example):
{
"time": "2026-03-05T14:32:01Z",
"type": "response",
"auth": { "display_name": "racf-orchestrator", "policies": ["racf-pipeline"] },
"request": { "path": "secret/data/racf-pipeline/zosmf", "operation": "read" },
"response": { "data": { "keys": ["username", "password"] } } /* values redacted */
}
/* Every secret retrieval is logged with: who, what, when, from which IP */15.4 Secrets Governance Controls
| Control | Requirement | Vault implementation | CyberArk implementation |
|---|---|---|---|
| No static credentials | Pipeline must not store secrets at rest | AppRole + short-lived tokens | Conjur Host + CPM rotation |
| Least privilege | Pipeline retrieves only secrets it needs | Vault policy scoped to secret paths | Conjur policy scoped to variables |
| Automatic rotation | z/OSMF passwords rotated ≤ 90 days | Vault dynamic secrets (1-hour TTL) | CPM rotates on schedule |
| Audit trail | Every retrieval logged with requester identity | Vault audit log → SIEM | CyberArk Vault audit → SIEM |
| Break-glass access | Emergency human access with dual approval | Vault emergency policy + MFA | CyberArk PVWA dual-control |
| ServiceNow integration | No secrets stored in ServiceNow | Vault Agent on MID Server | CyberArk App for ServiceNow |
| Zowe integration | No secrets in zowe.config.json | Vault Agent injects at runtime | CyberArk credential provider plugin |
16. Interactive Prompt Tester
Select a prompt from the library (Section 13), fill in the template variables below, and the fully rendered prompt will appear in the output panel — ready to copy into your AI assistant or orchestrator. This eliminates transcription errors when adapting prompts for new applications.
Generate a certificate renewal JCL for:
Owner ID: {{OWNER_ID}}
Label: {{NEW_LABEL}}
CN: {{FQDN}}
OU: {{OU}}
O: {{ORG}}
C: {{COUNTRY}}
Ring: {{RING_NAME}}
Valid until: {{NOTAFTER_DATE}}
Key usage: {{KEYUSAGE}}
Replace old label: {{OLD_LABEL}} (connect new as DEFAULT, leave old connected)17. Certificate Expiry Dashboard
A live view of the certificate estate by expiry horizon. In production this data is fed from the nightly inventory database (Section 5.3). The sample dataset below represents a typical large-bank estate of 10,000+ certificates and illustrates the distribution patterns, SLA breach zones, and renewal pipeline health that engineers and management need to see at a glance.
Certificates by expiry tier
Expiries by week (next 12 weeks)
| Label | Owner | Ring | Days left | Tier | Pipeline | SLA |
|---|---|---|---|---|---|---|
P.PAYMT1.TLS.2024 | PAYMT1P | P.PAYMT1.ATTLS.TLS | -12d | Expired | – No | Immediate incident |
P.CICS1.TLS.2024 | CICS1P | P.CICS1.ATTLS.TLS | -3d | Expired | – No | Immediate incident |
P.MQ1.TLS.2026A | MQ1P | P.MQ1.ATTLS.TLS | 4d | Critical | – No | P1 — renew within 24 h |
P.API1.TLS.2026 | API1P | P.API1.ATTLS.TLS | 11d | Critical | ✓ Yes | P1 — renew within 24 h |
P.BATCH1.TLS.2026 | BATCH1P | P.BATCH1.ATTLS.TLS | 18d | Critical | – No | P1 — renew within 24 h |
P.CICS2.TLS.2026 | CICS2P | P.CICS2.ATTLS.TLS | 22d | Critical | ✓ Yes | P1 — renew within 24 h |
P.ZCON1.TLS.2026 | ZCON1P | P.ZCON1.ATTLS.TLS | 29d | Critical | – No | P1 — renew within 24 h |
P.PAYMT2.TLS.2026 | PAYMT2P | P.PAYMT2.ATTLS.TLS | 34d | Warning | ✓ Yes | P2 — renew within 7 days |
P.RISK1.TLS.2026 | RISK1P | P.RISK1.ATTLS.TLS | 41d | Warning | – No | P2 — renew within 7 days |
P.FRAUD1.TLS.2026 | FRAUD1P | P.FRAUD1.ATTLS.TLS | 47d | Warning | ✓ Yes | P2 — renew within 7 days |
P.CORE1.TLS.2026 | CORE1P | P.CORE1.ATTLS.TLS | 55d | Warning | – No | P2 — renew within 7 days |
P.TRADE1.TLS.2026 | TRADE1P | P.TRADE1.ATTLS.TLS | 58d | Warning | ✓ Yes | P2 — renew within 7 days |
P.LOANS1.TLS.2026 | LOANS1P | P.LOANS1.ATTLS.TLS | 68d | Attention | ✓ Yes | P3 — schedule renewal |
P.CARDS1.TLS.2026 | CARDS1P | P.CARDS1.ATTLS.TLS | 82d | Attention | – No | P3 — schedule renewal |
P.SWIFT1.TLS.2026 | SWIFT1P | P.SWIFT1.ATTLS.TLS | 95d | Attention | ✓ Yes | P3 — schedule renewal |
P.REPO1.TLS.2026 | REPO1P | P.REPO1.ATTLS.TLS | 110d | Attention | – No | P3 — schedule renewal |
P.AUDIT1.TLS.2026 | AUDIT1P | P.AUDIT1.ATTLS.TLS | 145d | Healthy | – No | No action required |
P.COMPL1.TLS.2026 | COMPL1P | P.COMPL1.ATTLS.TLS | 198d | Healthy | – No | No action required |
P.ARCH1.TLS.2027 | ARCH1P | P.ARCH1.ATTLS.TLS | 245d | Healthy | – No | No action required |
P.INFRA1.TLS.2027 | INFRA1P | P.INFRA1.ATTLS.TLS | 312d | Healthy | – No | No action required |
P.DEVOPS1.TLS.2027 | DEVOP1P | P.DEVOP1.ATTLS.TLS | 358d | Healthy | – No | No action required |
SLA reference