<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Kubernetes Dive]]></title><description><![CDATA[Thoughts, stories and ideas.]]></description><link>https://blog.ankitjodhani.com/</link><image><url>https://blog.ankitjodhani.com/favicon.png</url><title>Kubernetes Dive</title><link>https://blog.ankitjodhani.com/</link></image><generator>Ghost 5.85</generator><lastBuildDate>Mon, 23 Mar 2026 07:48:04 GMT</lastBuildDate><atom:link href="https://blog.ankitjodhani.com/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[AWS Break Glass Access: The Complete  Guide]]></title><description><![CDATA[This blog explores the Break Glass concept in AWS. An emergency access mechanism for multi-account environments. We'll walk through the approaches, architecture, step-by-step setup, and real-world scenarios.]]></description><link>https://blog.ankitjodhani.com/the-complete-aws-break-glass-implementation-guide/</link><guid isPermaLink="false">69958b8a4bdb1ed3f7757431</guid><category><![CDATA[AWS]]></category><category><![CDATA[Landing Zone]]></category><category><![CDATA[AWS Organization]]></category><category><![CDATA[Best practices]]></category><dc:creator><![CDATA[Ankit Jodhani]]></dc:creator><pubDate>Thu, 19 Feb 2026 05:02:20 GMT</pubDate><media:content url="https://cdn.ankitjodhani.com/2026/02/breakGlass-1.gif" medium="image"/><content:encoded><![CDATA[<img src="https://cdn.ankitjodhani.com/2026/02/breakGlass-1.gif" alt="AWS Break Glass Access: The Complete  Guide"><p></p><h2 id="%F0%9F%99%8B%E2%80%8D%E2%99%82%EF%B8%8F-introduction">&#x1F64B;&#x200D;&#x2642;&#xFE0F; Introduction</h2><p>Hi All, I&apos;m <a href="https://www.linkedin.com/in/ankit-jodhani/?ref=blog.ankitjodhani.com">Ankit Jodhani</a>, a Kubestronaut and was working as a Kubernetes Engineer in the past. Also an AWS Community Builder. I&apos;m very passionate about Cloud and Container technologies.</p><p>Recently, I came across this concept called &quot;Break Glass&quot; (I know, its not a new for eveyone but was new for me) &amp; honestly, it surprised me how critical it is and how rarely people talk about it in detail. So I spent good amount of time researching, reading AWS docs and blog. this blog is the result of all that.</p><p>Little Promotion: I&apos;m looking for Freelancing clients and Projects related to Kubernetes, Cloud, DevOps.  Feel free to reach out if you are looking for someone like me.</p><h2 id="%F0%9F%93%9A-synopsis">&#x1F4DA; Synopsis</h2><ul><li>Let&apos;s imagin one scenario: Its 2 AM in the midnight, You receive a phone call from your colleague that Payment service is failing and customers are getting errors.</li><li>You grab your laptop, open the browser, and go to your AWS SSO portal to login &amp; fix the issue but the portal shows <strong>&quot;Service Unavailable&quot;, </strong>You try again. Same thing. Your Identity Provider (Okta, Azure AD, whatever you use) is either down or something is broken with IAM Identity Center.</li><li>Now you&apos;re standing there, fully awake, knowing exactly what to fix but </li></ul><blockquote><strong>You</strong> <strong>can&apos;t get into your AWS accounts. You are completely locked out.</strong></blockquote><ul><li>I know this is a hypothetical scenario but this can happen to real teams at real companies. And the one who has Break Glass mechanism in place, they can fix the issue and go back to sleep.  And the one who doesn&apos;t??  Welll.. for them it can be very long night..</li></ul><p>In this blog, we&apos;ll explore:</p><h3 id="%F0%9F%94%B8-story">&#x1F538; Story</h3><ul><ul><li>What the normal &quot;Day-2&quot; access flow looks like &amp; what can go wrong with it</li><li>The Break Glass concept: what it actually means? &amp; why you need it?</li><li>The different approaches to implement Break Glass in AWS</li><li>Complete architecture for a production-grade Break Glass setup</li><li>Step-by-step implementation guide</li><li>How to setup alerts and monitoring for Break Glass usage</li><li>Break Glass drill procedure: How to test it</li><li>Real-world emergency scenarios &amp; exactly how Break Glass saves you</li></ul></ul><h2 id="%F0%9F%94%84-normal-access-flow-day-2-operations">&#x1F504; Normal Access Flow (Day-2 Operations)</h2><p>First, Let&apos;s understand how <strong>normal</strong> access works. Because Break Glass only makes sense when you understand what it&apos;s replacing.</p><p>Here&apos;s how engineers access AWS accounts on a daily basis:</p><pre><code class="language-bash">+------------------+
|     Engineer     |
+------------------+
          |
          v
+------------------+
|     Web Browser  |
+------------------+
          |
          v
+-----------------------------------+
|   Identity Provider (IdP)        |
|  (Okta / Azure AD / Google)      |
+-----------------------------------+
          |
          v
+-----------------------------------+
|        MFA Challenge             |
| (Authenticator App / SMS / etc.) |
+-----------------------------------+
          |
          v
+-----------------------------------+
| SSO Portal - Account &amp; Role List |
+-----------------------------------+
          |
          v
+-----------------------------------+
| Engineer selects:                |
| &quot;Production Account &#x2192; ReadOnly&quot;  |
+-----------------------------------+
          |
          v
+-----------------------------------+
| IAM Identity Center              |
| Assumes IAM Role in Target Acct  |
+-----------------------------------+
          |
          v
+-----------------------------------+
| Production AWS Account           |
| IAM Role: ReadOnly               |
| Temporary Credentials            |
| (1&#x2013;12 Hour Expiry)               |
+-----------------------------------+</code></pre><ul><li>No passwords are stored. No long-lived access keys. No IAM users in member accounts.</li><li>All access is temporary, auditable, and centrally managed through IAM Identity Center.</li></ul><p>This is good. This is the right way. But what happens when this flow <strong>breaks</strong>?</p><h2 id="%E2%9A%A0%EF%B8%8F-what-can-go-wrong">&#x26A0;&#xFE0F; What Can Go Wrong?</h2><table>
<thead>
<tr>
<th><strong>Failure Scenario</strong></th>
<th><strong>Impact</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Identity Provider (Okta / Azure AD) is down</td>
<td>No one can authenticate, Complete lockout from all accounts.</td>
</tr>
<tr>
<td>IAM Identity Center service outage(rare but possible)</td>
<td>SSO portal unreachable, No one can assume roles.</td>
</tr>
<tr>
<td>Someone misconfigures an SCP on Root or Workload OU</td>
<td>SCP accidentally denies <code>sts:AssumeRole</code>. Identity Center can&apos;t assume roles in member accounts.</td>
</tr>
<tr>
<td>Identity Provider is compromised by attacker</td>
<td>You need to cut off SSO immediately. But then how does YOUR team access AWS to respond to the incident?</td>
</tr>
</tbody>
</table>
<ul><li>In <strong>all</strong> of these scenarios, your normal access path is broken. You need an alternative way in. And that alternative is <strong>Break Glass.</strong></li></ul><h2 id="%F0%9F%94%90-the-break-glass-concept">&#x1F510; The Break Glass Concept</h2><ul><li>It is a <strong>Pre-established emergency access to mechanism</strong> to the system that bypasses the normal authentication &amp; authorization flow to selected set of people in case of emergency situations. </li></ul><p>It&apos;s called &quot;Break Glass&quot; because, its  like a fire alarm behind a glass panel and  you only break it in case of a real emergency.</p><p><strong>Few considerations:</strong></p><ul><li>&#x1F6AB; Never used for normal day-to-day operations</li><li>&#x2705; Must always be functional and ready</li><li>&#x1F6A8; Must trigger an immediate alert when used</li><li>&#x1F9D8; Must be simple enough to use under pressure</li><li>&#x1F512; Requires authorization, not everyone should have access</li></ul><h2 id="%F0%9F%93%8B-break-glass-approaches">&#x1F4CB; Break Glass Approaches</h2><p>There are 4 main approaches.</p><table>
<thead>
<tr>
<th><strong>Sr No</strong></th>
<th><strong>Approach</strong></th>
<th><strong>What It Is</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Treat Root User as Break Glass</td>
<td>Secure the management account root user as your last-resort emergency access</td>
</tr>
<tr>
<td>2</td>
<td>Break Glass IAM User in Management Account</td>
<td>Create dedicated IAM users (BreakGlass-1, BreakGlass-2) in the management account with cross-account roles</td>
</tr>
<tr>
<td>3</td>
<td>Dedicated Break Glass Account</td>
<td>Separate AWS account with its own IAM users + cross-account roles into member accounts</td>
</tr>
<tr>
<td>4</td>
<td>Backup Identity Provider</td>
<td>Configure a second IdP as fallback federation source</td>
</tr>
</tbody>
</table>
<p>In this blog we will focus on <strong>3rd Approach</strong>(Dedicated Break Glass Account) as it covers all the other approaches within itself.</p><p>All of them are fairly simple to implement and the choice depends on the criticalness of your workloads and the scale you operate at.</p><p>There are no hard rules about these approaches. You can also design a custom approach based on what you need. These are some patterns, not rigid rules.</p><h2 id="%F0%9F%8E%AF-architecture">&#x1F3AF; Architecture</h2><p>Let&apos;s understand the architecture before we jump into the implementation. This will give you a clear picture of what we&apos;re building.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2026/02/breakGlass.gif" class="kg-image" alt="AWS Break Glass Access: The Complete  Guide" loading="lazy" width="1350" height="1800"><figcaption><span style="white-space: pre-wrap;">AWS Break Glass Access: The Complete Guide</span></figcaption></figure><p>Here&apos;s what the architecture looks like:</p><ul><li><strong>Break Glass Account:</strong> A separate dedicated account in dedicated OU or in the Security OU with 2 IAM users (<code>BreakGlass-Admin-1</code> and <code>BreakGlass-Admin-2</code>)</li><li><strong>Management Account:</strong>  Has 2 Break Glass IAM users (<code>BreakGlass-1</code> and <code>BreakGlass-2</code>) along with the secured root user.</li><li><strong>Every critical member account(not every):</strong> Has 2 IAM roles (<code>BreakGlassReadOnly</code> and <code>BreakGlassAdmin</code> )<ul><li>These roles trust all 4 Break Glass users (2 from Management + 2 from Break Glass Account)</li><li>They require MFA for assumption</li></ul></li><li>The Break Glass Account has <strong>NO SSO access (</strong>it should be completely independent or disconnected from Identity Center - no one should have access to it)</li><li>The Break Glass Account has <strong>NO workloads (</strong>only CloudTrail and Config running)</li><li><strong>CloudTrail + EventBridge</strong> alerts fire whenever any Break Glass user logs in or assumes a role</li></ul><p>The key idea here: we have <strong>3 layers of emergency access</strong>, each independent of the other:</p><pre><code class="language-bash">Layer 1: Break Glass Account IAM Users
         (handles most common emergencies or used when Management Account itself is compromised or broken)

Layer 2: Break Glass IAM Users in Management Account
         (handles emergencies like SSO fix, SCP fix)

Layer 3: Management Account Root User
         (absolute last resort, when everything else fails)</code></pre><h2 id="%F0%9F%9A%80-step-by-step-implementation-guide">&#x1F680; Step-by-Step Implementation Guide</h2><h3 id="%F0%9F%94%B9-step-1-create-the-break-glass-aws-account">&#x1F539; Step 1: Create the Break Glass AWS Account</h3><p>Create a new AWS account through Account Factory or AWS Organizations.</p><ul><li><strong>Account Name:</strong> <code>BreakGlass</code></li><li><strong>Root Email:</strong> <code>aws-breakglass@xyz.com</code> (dedicated email, not shared with anyone else)</li><li><strong>OU Placement:</strong> Security OU (or create a dedicated sub-OU)</li></ul><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">The Break Glass Account must have its own dedicated email address. This email should NOT be shared or aliased with any other account&apos;s email group. Complete isolation.</div></div><p>Few critical things about this account:</p><ul><li>This account <strong>must be disconnected from SSO / Identity Center (</strong>no one should be able to access it via SSO)</li><li><strong>No workloads</strong> should run in this account (only CloudTrail and AWS Config, which are mandatory via Control Tower)</li><li>The SCPs on this account&apos;s OU should NOT block <code>sts:AssumeRole</code> or <code>iam:* </code>otherwise the Break Glass users won&apos;t be able to assume roles in other accounts</li></ul><h3 id="%F0%9F%94%B9-step-2-create-break-glass-iam-users-in-the-break-glass-account">&#x1F539; Step 2: Create Break Glass IAM Users in the Break Glass Account</h3><p>In the newly created Break Glass Account, create <strong>2 IAM users</strong> with console access:</p><ul><li><code>BreakGlass-Admin-1</code></li><li><code>BreakGlass-Admin-2</code></li></ul><p>For each user:</p><ul><li>a) Create the user with console access</li><li>b) Attach the permission to assume roles in other accounts</li></ul><pre><code class="language-json">{
  &quot;Version&quot;: &quot;2012-10-17&quot;,
  &quot;Statement&quot;: [
    {
      &quot;Sid&quot;: &quot;AllowAssumeBreakGlassRoles&quot;,
      &quot;Effect&quot;: &quot;Allow&quot;,
      &quot;Action&quot;: &quot;sts:AssumeRole&quot;,
      &quot;Resource&quot;: [
        &quot;arn:aws:iam::*:role/BreakGlassAdmin&quot;,
        &quot;arn:aws:iam::*:role/BreakGlassReadOnly&quot;
      ]
    }
  ]
}
</code></pre>
<ul><li>c) Add an MFA enforcement policy:<ul><li>This is important: Even if someone gets the password, they can&apos;t do anything without the hardware MFA device.</li></ul></li></ul><pre><code class="language-json">{
  &quot;Version&quot;: &quot;2012-10-17&quot;,
  &quot;Statement&quot;: [
    {
      &quot;Sid&quot;: &quot;DenyAllWithoutMFA&quot;,
      &quot;Effect&quot;: &quot;Deny&quot;,
      &quot;NotAction&quot;: [
        &quot;iam:CreateVirtualMFADevice&quot;,
        &quot;iam:EnableMFADevice&quot;,
        &quot;iam:GetUser&quot;,
        &quot;iam:ListMFADevices&quot;,
        &quot;iam:ListVirtualMFADevices&quot;,
        &quot;iam:ResyncMFADevice&quot;,
        &quot;sts:GetSessionToken&quot;
      ],
      &quot;Resource&quot;: &quot;*&quot;,
      &quot;Condition&quot;: {
        &quot;BoolIfExists&quot;: {
          &quot;aws:MultiFactorAuthPresent&quot;: &quot;false&quot;
        }
      }
    }
  ]
}
</code></pre>
<ul><li>d) Setup Hardware MFA:<ul><li>Use a <strong>hardware MFA device</strong> (YubiKey or similar): NOT a phone-based authenticator app</li><li>Register the MFA device on each user</li><li>Label the physical device clearly: <code>BG-ADMIN-1-MFA</code></li></ul></li><li>e) Store the credentials securely:<ul><li>Store passwords in your organization&apos;s security vault (1Password Business, CyberArk, HashiCorp Vault, something that does NOT depend on AWS)</li><li>Store the hardware MFA devices in a physical secure location (office safe, locked cabinet)</li></ul></li></ul><p>&#x1F4DB; Best practice: implement <strong>dual control</strong>. One person holds the password, another person holds the MFA device. Both must be present to use Break Glass. This prevents a single person from having unilateral access.</p><ul><li>These credentials should be shared with <strong>2 credible people</strong> in your organization, typically the <strong>Cloud Platform Lead</strong> and the <strong>CTO</strong></li></ul><p>Now repeat the same for <code>BreakGlass-Admin-2</code>.</p><h3 id="%F0%9F%94%B9-step-3-create-break-glass-iam-users-in-the-management-account">&#x1F539; Step 3: Create Break Glass IAM Users in the Management Account</h3><p>Now create <strong>2 more</strong> Break Glass users, but this time in the <strong>Management Account</strong>:</p><ul><li><code>BreakGlass-1</code></li><li><code>BreakGlass-2</code></li></ul><p>The setup is identical to Step 2, same policies, same MFA, same credential storage practices.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F914;</div><div class="kg-callout-text">You might have question that: But why do we need Break Glass users in BOTH accounts?</div></div><table>
<thead>
<tr>
<th><strong>Scenario</strong></th>
<th><strong>Break Glass Account Users</strong></th>
<th><strong>Management Account Users</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Access member accounts when SSO is down</td>
<td>&#x2705; Works</td>
<td>&#x2705; Works (two paths)</td>
</tr>
<tr>
<td>Fix SCPs / AWS Organizations</td>
<td>&#x274C; Must use Management Account root</td>
<td>&#x2705; Use Break Glass IAM user (faster, better audit trail)</td>
</tr>
<tr>
<td>Fix IAM Identity Center (SSO)</td>
<td>&#x274C; Must use Management Account root</td>
<td>&#x2705; Use Break Glass IAM user</td>
</tr>
<tr>
<td>Fix Control Tower</td>
<td>&#x274C; Must use Management Account root</td>
<td>&#x2705; Use Break Glass IAM user</td>
</tr>
</tbody>
</table>
<p><em><strong>Note</strong></em>: <em>The Break Glass Account <strong>cannot manage Organizations, SCPs, or Identity Center, </strong>only the Management Account can. Without Break Glass users in the Management Account, every SCP or SSO issue forces you to use root. And root should be the absolute last resort..</em></p><h3 id="%F0%9F%94%B9-step-4-create-cross-account-roles-in-every-member-account">&#x1F539; Step 4: Create Cross-Account Roles in Every Member Account</h3><ul><li>This is the critical piece that connects everything. In <strong>every critical member account</strong>, create 2 IAM roles:<ul><li><strong>Role 1: <code>BreakGlassReadOnly</code></strong>: For investigation and read-only access</li></ul></li></ul><pre><code class="language-json">{
  &quot;RoleName&quot;: &quot;BreakGlassReadOnly&quot;,
  &quot;MaxSessionDuration&quot;: 14400,
  &quot;AssumeRolePolicyDocument&quot;: {
    &quot;Version&quot;: &quot;2012-10-17&quot;,
    &quot;Statement&quot;: [
      {
        &quot;Sid&quot;: &quot;TrustManagementAccountBreakGlass&quot;,
        &quot;Effect&quot;: &quot;Allow&quot;,
        &quot;Principal&quot;: {
          &quot;AWS&quot;: [
            &quot;arn:aws:iam::MANAGEMENT_ACCOUNT_ID:user/BreakGlass-1&quot;,
            &quot;arn:aws:iam::MANAGEMENT_ACCOUNT_ID:user/BreakGlass-2&quot;
          ]
        },
        &quot;Action&quot;: &quot;sts:AssumeRole&quot;,
        &quot;Condition&quot;: {
          &quot;Bool&quot;: { &quot;aws:MultiFactorAuthPresent&quot;: &quot;true&quot; }
        }
      },
      {
        &quot;Sid&quot;: &quot;TrustBreakGlassAccount&quot;,
        &quot;Effect&quot;: &quot;Allow&quot;,
        &quot;Principal&quot;: {
          &quot;AWS&quot;: [
            &quot;arn:aws:iam::BREAKGLASS_ACCOUNT_ID:user/BreakGlass-Admin-1&quot;,
            &quot;arn:aws:iam::BREAKGLASS_ACCOUNT_ID:user/BreakGlass-Admin-2&quot;
          ]
        },
        &quot;Action&quot;: &quot;sts:AssumeRole&quot;,
        &quot;Condition&quot;: {
          &quot;Bool&quot;: { &quot;aws:MultiFactorAuthPresent&quot;: &quot;true&quot; }
        }
      }
    ]
  },
  &quot;ManagedPolicyArns&quot;: [&quot;arn:aws:iam::aws:policy/ReadOnlyAccess&quot;]
}
</code></pre>
<ul><ul><li><strong>Role 2: </strong><code>BreakGlassAdmin</code>: For full admin access when you need to fix things<ul><li>Same trust policy as above, but attach <code>AdministratorAccess</code> instead of <code>ReadOnlyAccess</code>.</li></ul></li></ul></ul><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F525;</div><div class="kg-callout-text">Do NOT create these roles manually in every account. Automate it. Use <b><strong style="white-space: pre-wrap;">CloudFormation StackSets</strong></b>, <b><strong style="white-space: pre-wrap;">AFT (Account Factory for Terraform)</strong></b>, or <b><strong style="white-space: pre-wrap;">CfCT (Customizations for Control Tower)</strong></b>. This way, every new account automatically gets these roles.</div></div><h3 id="%F0%9F%94%B9-step-5-setup-alerts-and-monitoring">&#x1F539; Step 5: Setup Alerts and Monitoring</h3><p>This is non-negotiable. You <strong>MUST</strong> know when anyone uses Break Glass. whether it&apos;s a legitimate emergency or an attacker who got hold of the credentials.</p><p><strong>What to alert on:</strong></p><ul><li>Any console login by Break Glass IAM users</li><li>Any <code>sts:AssumeRole</code> call to <code>BreakGlassAdmin</code> or <code>BreakGlassReadOnly</code> roles</li><li>Any console login by root user (any account)</li><li>Any failed login attempts on Break Glass users</li></ul><p>Send notifications to:</p><ul><li>&#x1F4E7; Email: Security team + Cloud Platform Lead</li><li>&#x1F4AC; Slack/Teams: <code>#security-alerts</code> channel</li><li>&#x1F6A8; PagerDuty: High urgency (Break Glass login = always high urgency)</li></ul><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">This alert will also be useful while collecting evidence for auditors and for the compliance requirements.</div></div><h2 id="%F0%9F%94%A5-emergency-procedure-how-to-actually-use-break-glass">&#x1F525; Emergency Procedure: How to Actually Use Break Glass</h2><ul><li>There should be a complete documented procedure for how to use Break Glass and it should be easily accessible to your team in case of emergency.</li></ul><p>Here&apos;s the exact flow:</p><pre><code class="language-bash">STEP 0: Declare the emergency
  &#x2192; Cloud Lead or CTO approves Break Glass usage
  &#x2192; Notify #incident channel: &quot;Break Glass initiated. Reason: [XYZ]&quot;

STEP 1: Determine which layer you need
  &#x2192; Need to fix SCPs / SSO / Control Tower?
    &#x2192; Use BreakGlass-1 in Management Account 
  &#x2192; Management Account is compromised/broken?
    &#x2192; Use BreakGlass-Admin-1 in Break Glass Account
  &#x2192; Everything else has failed?
    &#x2192; Use Management Account Root (Layer 3)

STEP 2: Retrieve credentials
  &#x2192; Person A retrieves password from the vault
  &#x2192; Person B retrieves hardware MFA device from secure storage
  &#x2192; Both people must be present

STEP 3: Login
  &#x2192; Go to: https://ACCOUNT_ID.signin.aws.amazon.com/console
  &#x2192; Enter IAM username + password + MFA code
  &#x2192; You&apos;re in.

STEP 4: If you need to reach a member account
  &#x2192; Click username (top-right) &#x2192; &quot;Switch Role&quot;
  &#x2192; Enter target Account ID + Role (BreakGlassAdmin or BreakGlassReadOnly)
  &#x2192; You&apos;re now inside the target account

STEP 5: Fix the issue
  &#x2192; Document EVERY action you take (timestamps + what you did + why)

STEP 6: Exit and secure
  &#x2192; Log out. Return MFA devices to storage.
  &#x2192; Notify team: &quot;Break Glass session ended. Normal access restored.&quot;

STEP 7: Post-incident
  &#x2192; Rotate the Break Glass password that was used
  &#x2192; Review CloudTrail logs for the session
  &#x2192; Write incident report
  &#x2192; Conduct post-mortem: Why was Break Glass needed? How to prevent it?
  
</code></pre>
<h2 id="%F0%9F%A7%AA-break-glass-drill">&#x1F9EA; Break Glass Drill</h2><ul><li>As discussed earlier, Break Glass mechanism must remain functional at all times. so for that, you should conduct a <strong>Break Glass drill every 6 months(or based on your schedule)</strong></li><li>Drill checklist:<ul><li>&#x1F518;  Notify security team that a drill is starting</li><li>&#x1F518; Retrieve Break Glass credentials from vault</li><li>&#x1F518; Successfully log in as Break Glass user</li><li>&#x1F518; Successfully switch role into a non-production member account</li><li>&#x1F518; Verify alerts fired (security team confirms receipt)</li><li>&#x1F518; Log out and return credentials</li><li>&#x1F518; Rotate the password used during the drill</li><li>&#x1F518; Document results,  what worked, what didn&apos;t</li><li>&#x1F518; Update the runbook if anything was unclear</li></ul></li></ul><h2 id="%F0%9F%8E%AC-real-world-scenario-sso-is-down-production-is-on-fire">&#x1F3AC; Real-World Scenario: SSO is Down, Production is on Fire</h2><ul><li>Let me paint you a real picture of how all of this comes together:</li></ul><pre><code class="language-bash">2:00 AM &#x2014; PagerDuty fires. Payment service returning 500 errors.

2:02 AM &#x2014; On-call SRE tries SSO portal. &quot;Service Unavailable.&quot;
          Can&apos;t access any AWS account.

2:05 AM &#x2014; SRE escalates to Cloud Lead: &quot;SSO is down. Need Break Glass.&quot;

2:07 AM &#x2014; Cloud Lead approves. Opens 1Password (SaaS &#x2014; not on AWS).
          Retrieves BreakGlass-1 password. Grabs YubiKey from drawer.

2:10 AM &#x2014; Logs into Management Account:
          https://111111111111.signin.aws.amazon.com/console
          Username: BreakGlass-1 | Password: *** | MFA: YubiKey

2:11 AM &#x2014; Switches Role to Production Account:
          Account: 222222222222 | Role: BreakGlassAdmin

2:13 AM &#x2014; Inside Production Account. Investigates the issue.
          Finds bad deployment. Initiates rollback.

2:20 AM &#x2014; Application recovers. 500 errors stop.

2:22 AM &#x2014; Logs out. Returns YubiKey to secure storage.

2:25 AM &#x2014; Posts in #incident: &quot;Production restored. Break Glass ended.&quot;

Next morning:
  &#x2192; Security team reviews CloudTrail logs
  &#x2192; BreakGlass-1 password rotated
  &#x2192; Incident report written
  &#x2192; Post-mortem: Why did SSO go down? How to prevent it?

</code></pre>
<h2 id="%F0%9F%99%8C-conclusion">&#x1F64C; Conclusion</h2><p>Break Glass is one of those things that you set up hoping you&apos;ll never use it. But when you do need it, you&apos;ll be incredibly glad it&apos;s there.</p><p>I tried to cover all the important details and best practices. But writing everything in one blog is obviously not possible.</p><p>And that&apos;s a wrap! &#x1F64C;&#x1F942;</p><p>if you like my work please message me on LinkedIn with <strong><em>&quot;Hi + your country name&quot;</em></strong></p><ul><li>&#x1F64B;&#x200D;&#x2642;&#xFE0F; Ankit Jodhani (Again, I&apos;m open for Kubernetes, Cloud and DevOps Project)</li></ul><p>&#x1F4E8; reach me at <a href="mailto:ankitjodhani1903@gmail.com"><strong>ankitjodhani1903@gmail.com</strong></a></p>
<!--kg-card-begin: html-->
<!DOCTYPE html>
<html lang="en">
<head>
<link href="https://unpkg.com/boxicons@2.1.4/css/boxicons.min.css" rel="stylesheet">
</head>
  <style>
 .social-box {
    display: flex;
    padding: 0px 100px;
    justify-content: space-between;
}
    
 .social-box a {
      font-size: 100px;
      text-decoration: none;
   
    }   
</style>
<body>
  <div class="social-box">
       <a class="social-links-ankit" href="https://www.linkedin.com/in/ankit-jodhani/?ref=blog.ankitjodhani.com"><i class="bx bxl-linkedin-square"> </i> </a>
    
           <a class="social-links-ankit" href="https://twitter.com/Ankit__Jodhani?ref=blog.ankitjodhani.com"><i class="bx bxl-twitter"></i> </a>
    
           <a class="social-links-ankit" href="https://github.com/AnkitJodhani?ref=blog.ankitjodhani.com"><i class="bx bxl-github"></i> </a>
  </div>
</body>
</html>

<!--kg-card-end: html-->

<!--kg-card-begin: html-->
<div class="toc"></div>

<!--kg-card-end: html-->
]]></content:encoded></item><item><title><![CDATA[AWS S3 Cost Optimization: Automate Cleanup of Abandoned Buckets]]></title><description><![CDATA[This blog explores an event-driven architecture that automatically identifies and cleans up abandoned S3 buckets to optimize AWS costs]]></description><link>https://blog.ankitjodhani.com/aws-s3-cost-optimization-automate-cleanup-of-abandoned-buckets/</link><guid isPermaLink="false">680677eb419ce0de7bb30d3a</guid><category><![CDATA[Lambda]]></category><category><![CDATA[Cost Optimization]]></category><category><![CDATA[AWS]]></category><category><![CDATA[AWS S3]]></category><category><![CDATA[Automation]]></category><dc:creator><![CDATA[Ankit Jodhani]]></dc:creator><pubDate>Mon, 21 Apr 2025 19:33:51 GMT</pubDate><media:content url="https://cdn.ankitjodhani.com/2025/04/s3-1.gif" medium="image"/><content:encoded><![CDATA[<img src="https://cdn.ankitjodhani.com/2025/04/s3-1.gif" alt="AWS S3 Cost Optimization: Automate Cleanup of Abandoned Buckets"><p></p><h2 id="%F0%9F%99%8B%E2%80%8D%E2%99%82%EF%B8%8F-introduction">&#x1F64B;&#x200D;&#x2642;&#xFE0F; Introduction</h2><p>Hey folks! I&apos;m&#xA0;<a href="https://www.linkedin.com/in/ankit-jodhani/?ref=blog.ankitjodhani.com" rel="noreferrer">Ankit</a>, working as a Kubernetes Engineer at CirrOps and a newly minted AWS Community Builder. I&#x2019;m passionate about Cloud and Container technologies. But today, I&apos;m switching gears to talk about something equally important - Cost Optimization, more specifically about AWS S3 buckets.</p><h2 id="%F0%9F%93%9A-synopsis">&#x1F4DA; Synopsis</h2><ul><li>In most projects, we spin up a large number of AWS S3 buckets while developing an app or testing, or running in production. It&apos;s often the go-to solution for media-related applications. But the problem here is, when testing applications or in production environments, we end up creating tons of buckets that eventually get abandoned by users, testers, employees, or applications. </li><li>These buckets not only lead to unnecessary costs but also create a management mess. Deleting those buckets by hand is boring, easy to mess up, and frankly, the last thing anyone wants to do on a Friday evening. So I spent a weekend putting together an event&#x2011;driven cleanup workflow that removes or retains buckets automatically for us</li></ul><h2 id="%F0%9F%A4%A9-tldr-the-%E2%80%9C30%E2%80%91second%E2%80%9D-version">&#x1F929; TL;DR (The &#x201C;30&#x2011;Second&#x201D; Version)</h2><table>
<thead>
<tr>
<th><strong>Condition</strong></th>
<th><strong>Action</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td>Bucket not accessed in the last N days and tag <code>autoDelete=True</code></td>
<td>Delete bucket (all versions, then the bucket itself)</td>
</tr>
<tr>
<td>Bucket not accessed in the last N days and tag <code>autoDelete=False</code></td>
<td>Ignore IT and leave it as it is</td>
</tr>
<tr>
<td>Bucket not accessed in the last N days and no valid tag</td>
<td>Notify user &#x279C; URI to Keep (adds tag) or Delete (deletes bucket)</td>
</tr>
</tbody>
</table>
<h3 id="%F0%9F%92%A1high-level-solution">&#x1F4A1;High-Level Solution</h3><p>Imagine every bucket in your account carries a simple tag:</p><ul><li><code>autoDelete=True</code> or <code>autoDelete=False</code></li></ul><p>A scheduled job or script runs daily and checks each bucket&#x2019;s last access date. If a bucket hasn&#x2019;t seen any activity in the last  30 days and its tag is:</p><ul><li><strong>True</strong> &#x2192; Automatically delete the bucket and all of its contents.</li><li><strong>False</strong> &#x2192; Leave the bucket alone.</li><li><strong>Missing or invalid</strong> &#x2192; Send the bucket owner an email with two options:<ol><li><strong>Keep it</strong> (tag it <code>autoDelete=False</code>)</li><li><strong>Delete it</strong> (confirm deletion)</li></ol></li></ul><p>This approach makes sure that we only remove truly abandoned buckets and gives users one-click control over exceptions.</p><h2 id="%E2%9C%85-prerequisites">&#x2705; Prerequisites</h2><ul><li>&#x1F4CD; An AWS account with administrative privileges</li><li>&#x1F4CD; Basic familiarity with Python and Boto3</li><li>&#x1F4CD; Understanding of AWS Lambda, EventBridge, SNS, API Gateway, and DynamoDB</li></ul><h2 id="%F0%9F%93%A6-list-of-aws-services">&#x1F4E6; List of AWS services</h2><ul><li>&#x1FAA3; Amazon S3</li><li>&#x1F4E8; AWS SNS</li><li>&#x26C5; CloudTrail</li><li>&#x1F680; DynamoDB</li><li>&#x1F5A5;&#xFE0F; AWS Lambda </li><li>&#x1F504; AWS EventBridge</li><li>&#x1F30F; Amazon API Gateway</li></ul><h2 id="%F0%9F%8E%AF-architecture">&#x1F3AF; Architecture</h2><p>Let&apos;s dive into how this all works together. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2025/04/s3.gif" class="kg-image" alt="AWS S3 Cost Optimization: Automate Cleanup of Abandoned Buckets" loading="lazy" width="1280" height="1664"><figcaption><span style="white-space: pre-wrap;">S3 Cost Optimization</span></figcaption></figure><p>A key question you might have is:  How can we determine when a bucket was last accessed?  There are several approaches, but I personally prefer using a combination of CloudTrail, Lambda, and DynamoDB</p><p>Here&apos;s the breakdown of the architecture:</p><h3 id="%E2%9B%85-cloudtrail">&#x26C5; CloudTrail:</h3><ul><li>AWS doesn&#x2019;t expose LastAccessed for a bucket out of the box, but CloudTrail records every object&#x2011;level API call. </li><li>I configured a data&#x2011;event trail (yes, it costs a little extra) and pointed it at the logging bucket <code>s3EventLoggingStorage</code> </li><li>Each time an object is listed, uploaded, downloaded, or deleted, the event lands in EventBridge and triggers s3EventLogger, which writes:</li></ul><pre><code class="language-json">{
    &quot;BucketName&quot; xyz-terraform: ,
    &quot;EventDateTime&quot;: 2024-07-24T00:41:03Z ,
    &quot;EventName&quot;: ListObjects ,
    &quot;EventDate&quot;: 2024-07-24,
    &quot;EventTime&quot;: 00:41:03Z,
    &quot;Status&quot;: Active
}
</code></pre>
<h2 id="%F0%9F%94%84-eventbridge">&#x1F504; EventBridge:</h2><p>EventBridge has two important components in our solution:</p><ul><li>Rule: Created a rule in the event bus with an event pattern that captures every S3 bucket event (except for our CloudTrail logging bucket <code>s3EventLoggingStorage</code>)<ul><li>This rule triggers a Lambda function named <code>s3EventLogger</code>  when the pattern matches.</li><li>The event pattern configuration looks like this:</li></ul></li></ul><pre><code class="language-json">{
  &quot;source&quot;: [&quot;aws.s3&quot;],
  &quot;detail-type&quot;: [&quot;AWS API Call via CloudTrail&quot;],
  &quot;detail&quot;: {
    &quot;eventSource&quot;: [&quot;s3.amazonaws.com&quot;],
    &quot;eventName&quot;: [&quot;ListObjects&quot;, &quot;ListObjectVersions&quot;, &quot;PutObject&quot;, &quot;GetObject&quot;, &quot;HeadObject&quot;, &quot;CopyObject&quot;, &quot;GetObjectAcl&quot;, &quot;PutObjectAcl&quot;, &quot;CreateMultipartUpload&quot;, &quot;ListParts&quot;, &quot;UploadPart&quot;, &quot;CompleteMultipartUpload&quot;, &quot;AbortMultipartUpload&quot;, &quot;UploadPartCopy&quot;, &quot;RestoreObject&quot;, &quot;DeleteObject&quot;, &quot;DeleteObjects&quot;, &quot;GetObjectTorrent&quot;, &quot;SelectObjectContent&quot;, &quot;PutObjectLockRetention&quot;, &quot;PutObjectLockLegalHold&quot;, &quot;GetObjectLockRetention&quot;, &quot;GetObjectLockLegalHold&quot;],
    &quot;requestParameters&quot;: {
      &quot;bucketName&quot;: [{
        &quot;anything-but&quot;: [&quot;s3EventLoggingStorage&quot;]
      }]
    }
  }
}

</code></pre>
<ul><li>Scheduler: Executes the <code>s3Scanner</code> Lambda function daily at 9:00 AM </li></ul><h3 id="%F0%9F%96%A5%EF%B8%8F-lambda">&#x1F5A5;&#xFE0F; Lambda</h3><ul><li>We have a total of 3 buckets working together:</li><li>1) <code>s3EventLogger</code>:<ul><li>Primary purpose: Record when any bucket receives any kind of API call </li><li>Collects event data such as BucketName, EventDate etc.. </li><li>Stores this data in a DynamoDB table called <code>s3DateLogger</code></li></ul></li><li>2) <code>s3Scanner</code>:<ul><li>Triggered every day at 9:00 AM UTC via an EventBridge schedule.</li><li>Lists all buckets in the account.</li><li>For each bucket:<ul><li>Fetch the last access date from DynamoDB (if no record exists in DynamoDB, then create an entry  and set the date to 15 days in the future as a grace period - useful if you have older buckets)</li><li>Calculate days since the last access </li><li>Retrieve the <code>autoDelete</code> tag (if any)<ul><li>Decision logic:<ul><li><code>autoDelete=True</code> &amp;&amp; days &#x2265; 30 &#x2192; Delete all object versions, then the bucket</li><li><code>autoDelete=False</code> &amp;&amp; days &#x2265; 30 &#x2192; Skip deletion</li></ul></li></ul></li><li>No valid tag &amp;&amp; days &#x2265; 30 &#x2192; Publish a notification to SNS </li></ul></li></ul></li><li>3) <code>userHandler</code>:<ul><li>It will be triggered by the API Gateway </li><li>It either deletes the bucket or adds the <code>autoDelete=False</code> tag based on user choice</li></ul></li></ul><h3 id="%F0%9F%93%A7-sns">&#x1F4E7; SNS:</h3><ul><li>SNS sends an email to the user with two links:<ul><li>Keep It: Calls the API Gateway endpoint <code>?bucket_name={bucket_name}&amp;action=keep</code></li><li>Delete It: Calls the API Gateway endpoint <code>?{bucket_name}&amp;action=delete</code></li></ul></li></ul><h3 id="%F0%9F%8C%8F-api-gateway">&#x1F30F; API Gateway: </h3><ul><li>Provides endpoints for users to respond to notifications, triggering the <code>userHandler</code> Lambda</li></ul><h2 id="%F0%9F%A7%91%E2%80%8D%F0%9F%92%BB-source-code">&#x1F9D1;&#x200D;&#x1F4BB; Source Code</h2><p>The GitHub repository contains:</p><ul><li><a href="https://github.com/AnkitJodhani/s3CostOptimization?ref=blog.ankitjodhani.com" rel="noreferrer">Link</a>:  Source code for all three Lambda functions</li><li><em>Note:</em> This was a weekend project, so the code has room for improvement. If you&#x2019;re a beginner, try extending it to fit your own environment. &#x1F60A;</li></ul><div class="kg-card kg-button-card kg-align-center"><a href="https://github.com/AnkitJodhani/s3CostOptimization?ref=blog.ankitjodhani.com" class="kg-btn kg-btn-accent">&#x1F446; GitHub Repository</a></div><h2 id="%F0%9F%99%8C-conclusion">&#x1F64C; Conclusion</h2><p>We explored how to automate S3 cleanup with minimal human intervention and potentially save a bunch of dollars in the process. Give it a try over the weekend or at your convenience. I hope you enjoyed this blog as much as I enjoyed creating it.</p><p>And that&#x2019;s a wrap! &#x1F64C;&#x1F942;</p><p>If you liked my work, please message me on LinkedIn with &quot;Hi + your country name&quot;</p><ul><li>&#x1F64B;&#x200D;&#x2642;&#xFE0F; Ankit Jodhani</li></ul><p>&#x1F4E8; Reach me at&#xA0;<a href="mailto:ankitjodhani1903@gmail.com" rel="nofollow noopener"><strong>ankitjodhani1903@gmail.com</strong></a></p>
<!--kg-card-begin: html-->
<!DOCTYPE html>
<html lang="en">
<head>
<link href="https://unpkg.com/boxicons@2.1.4/css/boxicons.min.css" rel="stylesheet">
</head>
  <style>
 .social-box {
    display: flex;
    padding: 0px 100px;
    justify-content: space-between;
}
    
 .social-box a {
      font-size: 100px;
      text-decoration: none;
   
    }   
</style>
<body>
  <div class="social-box">
       <a class="social-links-ankit" href="https://www.linkedin.com/in/ankit-jodhani/?ref=blog.ankitjodhani.com"><i class="bx bxl-linkedin-square"> </i> </a>
    
           <a class="social-links-ankit" href="https://twitter.com/Ankit__Jodhani?ref=blog.ankitjodhani.com"><i class="bx bxl-twitter"></i> </a>
    
           <a class="social-links-ankit" href="https://github.com/AnkitJodhani?ref=blog.ankitjodhani.com"><i class="bx bxl-github"></i> </a>
  </div>
</body>
</html>

<!--kg-card-end: html-->

<!--kg-card-begin: html-->
<div class="toc"></div>

<!--kg-card-end: html-->
]]></content:encoded></item><item><title><![CDATA[Ultimate Guide to Monitoring & Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail]]></title><description><![CDATA[Installation and configuration of Prometheus Operator, Grafana, Loki, and Promtail to ensure seamless application performance and effective log management. Learn to create custom metrics, set up alerting mechanisms, and visualize data through the Grafana dashboard monitoring solution]]></description><link>https://blog.ankitjodhani.com/ultimate-guide-monitoring-logging-aws-eks-prometheus-grafana-loki-promtail/</link><guid isPermaLink="false">66848ec0a44c003edac66d1e</guid><category><![CDATA[EKS]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Terraform]]></category><category><![CDATA[Monitoring]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Prometheus]]></category><category><![CDATA[Grafana]]></category><category><![CDATA[Loki]]></category><category><![CDATA[Promtail]]></category><dc:creator><![CDATA[Ankit Jodhani]]></dc:creator><pubDate>Sun, 07 Jul 2024 15:34:25 GMT</pubDate><media:content url="https://cdn.ankitjodhani.com/2024/07/architecture.gif" medium="image"/><content:encoded><![CDATA[<h2 id="%F0%9F%99%8B%E2%80%8D%E2%99%82%EF%B8%8F-introduction">&#x1F64B;&#x200D;&#x2642;&#xFE0F; Introduction</h2><img src="https://cdn.ankitjodhani.com/2024/07/architecture.gif" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail"><p>Hello everyone, I&apos;m <a href="https://www.linkedin.com/in/ankit-jodhani/?ref=blog.ankitjodhani.com" rel="noreferrer">Ankit Jodhani</a>, a DevOps engineer passionate about Cloud and Container technologies. This blog is part of the #10weeksofcloudops series initiated by <a href="https://www.linkedin.com/in/piyush-sachdeva/?ref=blog.ankitjodhani.com" rel="noreferrer">Piyush Sachdeva</a>.</p><h2 id="%F0%9F%93%9A-synopsis">&#x1F4DA; Synopsis</h2><p>In this blog, we will implement monitoring and logging from scratch on an AWS EKS cluster using open-source tools like Prometheus, Grafana, Loki, and Promtail. We will also discuss writing custom metrics in a Node.js application, making them scrappable by Prometheus, setting up Alertmanager to receive email alerts, and designing custom dashboards in Grafana.</p><p>Once the monitoring part is over, we will move on to logging. We will set up Promtail and Loki to collect and aggregate logs, and finally visualize our logs using Grafana dashboards.</p><h3 id="%F0%9F%94%B8-story">&#x1F538; Story</h3><ul><li>Run Terraform scripts to create an EKS Cluster with necessary components.</li><li>Instrumentation: Understand prom-client to write custom metrics in a Node.js app and dockerize it.</li><li>Deploy the app on Kubernetes and make it accessible over the internet.</li><li>Install Prometheus Operator and Grafana on the EKS Cluster.</li><li>Configure Alerting rules, Service Monitors, and AlertManager for email alerts.</li><li>Create dashboards in Grafana to visualize the performance of the cluster and application.</li><li>Install Loki on the EKS Cluster and configure it with AWS S3 for log storage.</li><li>Install Promtail on the EKS Cluster and configure it to send logs to Loki.</li><li>Configure Grafana to display application logs.</li><li>Clean up all the infrastructure.</li></ul><h2 id="%E2%9C%85-prerequisites">&#x2705; Prerequisites</h2><ul><li>&#x1F4CC; AWS Account</li><li>&#x1F4CC; Basic knowledge of Terraform</li><li>&#x1F4CC; Basic knowledge of Docker</li><li>&#x1F4CC; Basic of Nodejs (Good to have)</li></ul><h2 id="%F0%9F%96%A5%EF%B8%8F-local-setup">&#x1F5A5;&#xFE0F; Local setup</h2><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">Ensure Terraform, Helm, and AWS CLI are installed on your computer. AWS CLI should be configured with admin privileges to avoid permission issues.</div></div><h2 id="%F0%9F%93%A6-list-of-aws-services">&#x1F4E6; List of AWS services</h2><ul><li>&#x1F451; Amazon EKS</li><li>&#x1F310; Amazon VPC</li><li>&#x1F512; Amazon IAM</li><li>&#x1F4BB; Amazon EC2</li><li>&#x2696;&#xFE0F; Amazon Autoscaling</li><li>&#x1FAA3;Amazon S3</li><li>&#x1F680; DynamoDB</li></ul><h2 id="%E2%98%B8%EF%B8%8F-list-of-kubernetes-tools-drivers">&#x2638;&#xFE0F; List of Kubernetes Tools &amp; Drivers</h2><ul><li>&#x1F4C2; EBS CSI Driver (EKS Addon)</li><li>&#x1F4CA; Helm charts</li><li>&#x1F9D1;&#x200D;&#x1F3ED; Prometheus Operator (using kube-prometheus-stack helm chart)</li><li>&#x1F514; Alertmanager (using kube-prometheus-stack helm chart)</li><li>&#x1F4BB; Grafana (using kube-prometheus-stack helm chart)</li><li>&#x1F5C3;&#xFE0F; Loki (using grafana helm chart)</li><li>&#x1F50E; Promtail (using grafana helm chart)</li></ul><h2 id="%E2%98%B8%EF%B8%8F-monitoring">&#x2638;&#xFE0F; Monitoring</h2><ul><li>Monitoring involves tracking the performance of your application and resources, and sending alerts when something is running slowly or failing, to prevent issues from escalating.</li></ul><h3 id="%F0%9F%93%8A-prometheus">&#x1F4CA; Prometheus</h3><ul><li>It is an open-source monitoring tool that tracks your workload and stores all your metrics in a time-series database.</li><li>We use PromQL to query the metrics</li><li>In this blog, we&apos;ll store data inside an AWS EBS volume.</li></ul><h3 id="%F0%9F%93%A2-alert-manager">&#x1F4E2; Alert manager</h3><ul><li>Alert Manager is a Prometheus component responsible for sending alerts to users.</li></ul><h2 id="%F0%9F%93%98-logging">&#x1F4D8; Logging </h2><ul><li>Logging helps you see what&apos;s happening inside your cluster, nodes, and how your application behaves in response to different requests and components, aiding in troubleshooting errors or bugs.</li></ul><h3 id="%F0%9F%93%9Cpromtail">&#x1F4DC;Promtail </h3><ul><li>Promtail is an open-source tool created by Grafana Labs. It collects all container logs and sends them to Loki.</li></ul><h3 id="%F0%9F%94%97-loki">&#x1F517; Loki</h3><ul><li>Loki is also an open-source tool designed and developed by Grafana Labs. It consumes data sent by Promtail or other tools, processes, and filters it.</li><li>We use LogQL to query the logs from loki.</li><li>Loki can be integrated with many cloud services, in this blog we&apos;ll use the AWS S3 bucket to store the logs. </li></ul><h3 id="%F0%9F%96%A5%EF%B8%8F-grafana">&#x1F5A5;&#xFE0F; Grafana</h3><ul><li>Grafana is a visualization tool commonly used for monitoring and logging. </li><li>Grafana can be integrated with prometheus, loki many other tool to create beautiful dashboard. </li><li>Grafana will query the prometheus &amp; loki to get the metrics and logs.</li></ul><h2 id="%F0%9F%8E%AF-architecture">&#x1F3AF; Architecture</h2><p>Let&apos;s understand the architecture of the project. Understanding the architecture makes it easier to proceed with the practical steps.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/architecture-1.gif" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="945" height="1080"><figcaption><span style="white-space: pre-wrap;">Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail</span></figcaption></figure><ul><li>As you can see in the architecture, Prometheus scrapes metrics from the application and cluster and stores them in AWS EBS Volumes to keep it persistent in case of pod failure. just like that Grafana &amp; Alermanger will also store its data inside EBS Volume.</li><li>Promtail will collect all the logs from the nodes (application logs + component logs) and send those logs to Loki.  </li><li>Loki will aggregate &amp; process the logs and send them to the AWS S3 bucket. </li><li>Grafana will query Prometheus and Loki for metrics and logs.</li></ul><h2 id="%F0%9F%9A%80-step-by-step-guide">&#x1F680; Step-by-Step Guide</h2><h3 id="%F0%9F%92%BB-clone-the-repository">&#x1F4BB; Clone the repository</h3><ul><li>Please clone the<a href="https://github.com/AnkitJodhani/eks-monitoring-and-logging.git?ref=blog.ankitjodhani.com" rel="noreferrer"><u>&#xA0;</u>Github repository</a>&#xA0;on your local computer.</li></ul><pre><code class="language-bash">git clone https://github.com/AnkitJodhani/eks-monitoring-and-logging.git

cd eks-monitoring-and-logging
</code></pre>
<ul><li>Below you can see the directory structures and purpose of each directory </li></ul><pre><code class="language-bash">&#x1F4C2;eks-private-container-registry
&#x251C;&#x2500;&#x2500;&#x1F4C1;app-code
&#x2502;   &#x2514;&#x2500;&#x2500; (Code of nodejs application)
&#x251C;&#x2500;&#x2500;&#x1F4C1;app-k8s-manifest
&#x2502;   &#x2514;&#x2500;&#x2500; (Contents of kubernetes manifest files for nodejs app)
&#x251C;&#x2500;&#x2500;&#x1F4C1;eks-terraform
&#x2502;   &#x2514;&#x2500;&#x2500; (Contains Terraform script to create AWS EKS cluster)
&#x251C;&#x2500;&#x2500;&#x1F4C1;grafana-dashboard
&#x2502;   &#x2514;&#x2500;&#x2500; (Contains json file for grafana dashboard )
&#x251C;&#x2500;&#x2500;&#x1F4C1;kube-prometheus-stack
&#x2502;   &#x2514;&#x2500;&#x2500; (Kubernetes manifest file for prometheus operator)
&#x251C;&#x2500;&#x2500;&#x1F4C1;loki-promtail-stack
&#x2502;   &#x2514;&#x2500;&#x2500; (Contents of Loki &amp; Promtail)
&#x251C;&#x2500;&#x2500;&#x1F63A;.gitignore
&#x251C;&#x2500;&#x2500;&#x1F4C4;readme.md
&#x2514;&#x2500;&#x2500;&#x1F4C4;test.sh
</code></pre>
<h3 id="%F0%9F%A7%91%E2%80%8D%F0%9F%92%BB-instrumentation">&#x1F9D1;&#x200D;&#x1F4BB; Instrumentation</h3><ul><li>Instrumentation is the process of making code changes in the application to write custom metrics &amp; expose metrics.</li><li>Instrumentation helps in Monitoring Performance + gaining insight of the application </li><li>I already created a demo nodejs app to demonstrate the Instrumentation. you will find the code inside <code>app-code</code> directory. </li><li>Please read <code>index.js</code> file. here I&apos;ll share a brief overview of the code<ul><li>Express Setup: Initializes an Express application and sets up logging with Morgan.</li><li>Logging with Pino: Defines a custom logging function using Pino for structured logging.</li><li>Prometheus Metrics with prom-client: Integrates Prometheus for monitoring HTTP requests using the <code>prom-client</code> library:<ul><li><code>http_requests_total</code> counter</li><li><code>http_request_duration_seconds</code> histogram</li><li><code>http_request_duration_summary_seconds</code> summary</li><li><code>node_gauge_example</code> gauge for tracking async task duration</li></ul></li></ul></li><ul><li>Basic Routes:<ul><li><code>/</code> : Returns a &quot;Running&quot; status.</li><li><code>/healthy</code>: Returns the health status of the server.</li><li><code>/serverError</code>: Simulates a 500 Internal Server Error.</li><li><code>/notFound</code>: Simulates a 404 Not Found error.</li><li><code>/logs</code>: Generates logs using the custom logging function.</li><li><code>/crash</code>: Simulates a server crash by exiting the process.</li><li><code>/example</code>: Tracks async task duration with a gauge.</li><li><code>/metrics</code>: Exposes Prometheus metrics endpoint.</li></ul></li></ul><li>After adding the required metrics, Dockerize the application and push it to the container registry. In my case, I pushed it to the docker hub.</li></ul><h3 id="%F0%9F%91%91-eks-cluster-using-terraform">&#x1F451; EKS Cluster using Terraform</h3><ul><li>Now, let&apos;s go ahead and spin up the EKS Cluster.</li></ul><pre><code class="language-bash">cd eks-terraform/main
</code></pre>
<ul><li>in this directory, you will find all the config files for Terraform like <code>backend.tf</code> <code>terraform.tfvars</code> etc.. and you can modify them based on your requirements but the default setting will work fine for this project.</li><li>Initialize the terraform</li></ul><pre><code class="language-bash">terraform init
</code></pre>
<ul><li>validate the script</li></ul><pre><code class="language-bash">terraform validate
</code></pre>
<ul><li>See the plan of what Terraform going to install for us</li></ul><pre><code>terraform plan
</code></pre>
<ul><li>VPC, IAM Roles, EKS Cluster + Managed NodeGroup, EBS CSI driver (using AWS Addon) + IRSA (IAM role for service account)</li><li>Now, let&apos;s execute the terraform to create the AWS EKS Cluster</li></ul><pre><code class="language-bash">terraform apply --auto-approve
</code></pre>
<div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x2623;&#xFE0F;</div><div class="kg-callout-text">Terraform takes approximately 20-30 minutes. So Enjoy the automation &#x1F600; &#x2026;.</div></div><p>Once, the above command completes its execution successfully, you have will have eks cluster running. let&apos;s head over to the aws console to verify that. </p><figure class="kg-card kg-gallery-card kg-width-wide kg-card-hascaption"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://cdn.ankitjodhani.com/2024/07/2.png" width="1715" height="797" loading="lazy" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail"></div><div class="kg-gallery-image"><img src="https://cdn.ankitjodhani.com/2024/07/3.png" width="2353" height="245" loading="lazy" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail"></div></div><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://cdn.ankitjodhani.com/2024/07/4.png" width="2239" height="299" loading="lazy" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail"></div><div class="kg-gallery-image"><img src="https://cdn.ankitjodhani.com/2024/07/5.png" width="2521" height="264" loading="lazy" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail"></div></div></div><figcaption><p><span style="white-space: pre-wrap;">AWS EKS Cluster using Terraform</span></p></figcaption></figure><ul><li>Let&apos;s update the <code>.kube/config</code> file to connect with the cluster</li></ul><pre><code class="language-bash">aws list-clusters --region us-east-1

aws eks update-kubeconfig --name monitoring-alerting-logging-eks-cluster --region us-east-1
</code></pre>
<ul><li>Now, we can review the K8s component </li></ul><pre><code class="language-bash">kubectl get all -n kube-system
</code></pre>
<h3 id="%F0%9F%A7%91%E2%80%8D%F0%9F%9A%80-deploy-nodejs-app">&#x1F9D1;&#x200D;&#x1F680; Deploy Nodejs app</h3><ul><li>Our EKS Cluster is running, and now we can deploy our node js application.</li><li>You will find the Kubernetes manifest file in <code>app-k8s-manifest</code> directory. </li><li>You might want to change the image name in <code>app-k8s-manifest/deployment.yml</code> file instead of going with <code>ankitjodhani/prometheus:learning</code></li><li>The <code>app-k8s-manifest/service.yml</code> will create a LoadBalancer to expose the app on the internet. Apply the file:</li></ul><pre><code class="language-bash">kubectl apply -k app-k8s-manifest/
</code></pre>
<ul><li>Head over to the AWS console and verify the load balancer (Classic Load Balancer).</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/07/6.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2278" height="804"></figure><ul><li>Now, you can take the DNS name of the Load Balancer and visit the website.</li><li>It&apos;s good to generate a load using an automated script. In the root directory, you will find <code>test.sh</code>, which will generate the load by sending a lot of requests.</li><li>So, open another new terminal and execute the below command. just like shown in the below image.</li></ul><pre><code class="language-bash">./test.sh YOUR_LOAD_BALANCER_DNS_NAME
</code></pre>
<figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/07/7.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1178" height="475"></figure><ul><li>Note: Keep running the <code>test.sh</code> and don&apos;t kill the terminal for a while.</li></ul><h3 id="%E2%9A%93-install-the-helm-chart">&#x2693; Install the Helm chart</h3><ul><li>Execute the below commands and install helm charts</li></ul><pre><code class="language-bash">helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
</code></pre>
<h3 id="%E2%9A%92%EF%B8%8F-install-configure-prometheus-operator">&#x2692;&#xFE0F; Install &amp; Configure Prometheus Operator</h3><ul><li>Now, let&apos;s install the Prometheus operator in the AWS EKS Cluster using the helm chart.</li><li>You will find all the Prometheus-related manifest files inside <code>kube-prometheus-stack</code> directory. </li></ul><pre><code class="language-bash">kubectl create ns monitoring

helm install monitoring prometheus-community/kube-prometheus-stack -n monitoring -f kube-prometheus-stack/custom_kube_prometheus_stack.yml
</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/8.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1725" height="383"><figcaption><span style="white-space: pre-wrap;">Install Prometheus Operator on AWS EKS Cluster</span></figcaption></figure><ul><li>It&apos;s time to configure custom Alerts, an Alertmanager to receive emails, and a ServiceMonitor to scrape our application metrics.</li><li>Before configuring Alertmanager, we need credentials to send emails. For this blog, I&apos;m using Gmail, but any SMTP provider like AWS SES can be used.  so let&apos;s grab the credentials for that.</li><li>Open your Google account settings and search <code>App password</code>  &amp; create a new password. </li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/07/9.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="985" height="488"></figure><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/07/10.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="821" height="704"></figure><ul><li>Copy the newly created password. it should be like <code>uhnlqkdhnirpqfpy</code></li></ul><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4DB;</div><div class="kg-callout-text">Ensure there are NO spaces in the password.</div></div><ul><li>Convert that password into<strong> base64 format.</strong> </li><li>Now, put your password in the <code>kube-prometheus-stack/email-secret.yml</code> and add your email ID to the <code>kube-prometheus-stack/alertmanagerconfig.yml</code> instead of mine.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/07/11.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1995" height="1335"></figure><ul><li>You can also take a look at <code>kube-prometheus-stack/alerts.yml</code> file to see the Rules that I&apos;ve set for the alerts.  <ul><li>Send an alert when the average node CPU is higher than 50%.</li><li>Send an alert when a POD restarts more than 2 times.</li></ul></li><li>Next, we will configure the Service Monitor to scrape the metrics from our Node.js application. </li><li>Refer to <code>kube-prometheus-stack/serviceMonitor.yml</code> for the configuration.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/12.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1986" height="1326"><figcaption><span style="white-space: pre-wrap;">configure service monitor with Prometheus operator</span></figcaption></figure><ul><li>It&apos;s time to apply all these configurations. Execute below command</li></ul><pre><code class="language-bash">kubectl apply -k kube-prometheus-stack/
</code></pre>
<ul><li>We need to wait for a couple of minutes for the Prometheus operator to reload its configuration.</li><li>Now, let&apos;s visit the Prometheus UI by running the following command and visiting <code>http://localhost:9090</code>.</li></ul><pre><code>kubectl port-forward -n monitoring service/prometheus-operated 9090:9090
</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/13.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2559" height="438"><figcaption><span style="white-space: pre-wrap;">Prometheus Operator</span></figcaption></figure><ul><li>To check the applied rules, click on the <code>Alerts</code> button at the top.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/14.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2550" height="763"><figcaption><span style="white-space: pre-wrap;">Configure Alert rules in Prometheus Operator</span></figcaption></figure><ul><li>Verify the target configuration by clicking the <code>Targets</code> button from the drop-down menu.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/15.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2560" height="1072"><figcaption><span style="white-space: pre-wrap;">Configure service monitor in Prometheus Operator</span></figcaption></figure><ul><li>Let&apos;s access the Alertmanager UI to see the alert configurations. Run the following command and visit <code>http://localhost:9093</code>.</li></ul><pre><code class="language-bash">kubectl port-forward -n monitoring service/alertmanager-operated 9093:9093
</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/16.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1177" height="545"><figcaption><span style="white-space: pre-wrap;">Configure Alertmanager in Prometheus Operator</span></figcaption></figure><ul><li>Click on the <code>Status</code> button at the top to see the applied configurations.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/17.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1179" height="1416"><figcaption><span style="white-space: pre-wrap;">Configure Alermanger in Prometheus Operator</span></figcaption></figure><ul><li>Now, let&apos;s crash the Node.js app twice to receive alerts from Alertmanager.</li><li>The Nodejs app has a route <code>/crash</code>, which crashes the container, and Kubernetes automatically restarts it. However, if the app crashes more than 2 times, Alertmanager will send an alert to our email.</li><li>let&apos;s see that practically </li></ul><pre><code class="language-bash">http://YOUR_LOAD_BALANCER_DNS_NAME/crash
</code></pre>
<ul><li>Keep hitting the above endpoint until Kubernetes restarts at least 3 times.</li></ul><pre><code class="language-bash">kubectl get pods
</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/18.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1472" height="138"><figcaption><span style="white-space: pre-wrap;">Prometheus Operator</span></figcaption></figure><ul><li>Check the alert in the firing state by running:</li></ul><pre><code class="language-bash">kubectl port-forward -n monitoring service/prometheus-operated 9090:9090
</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/19.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2559" height="585"><figcaption><span style="white-space: pre-wrap;">Alert rules in Prometheus Operator</span></figcaption></figure><ul><li>Verify Alertmanager received an alert from Prometheus:</li></ul><pre><code class="language-bash">kubectl port-forward -n monitoring service/alertmanager-operated 9093:9093
</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/20.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1222" height="816"><figcaption><span style="white-space: pre-wrap;">Alertmanger configure in Prometheus Operator</span></figcaption></figure><ul><li>You should receive an email notification on your configured mail address.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/21.0.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2274" height="767"><figcaption><span style="white-space: pre-wrap;">Configure alertmanger in Prometheus Operator</span></figcaption></figure><ul><li>We configured it to send emails every 5 minutes until the issue is resolved.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/21.1.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2283" height="151"><figcaption><span style="white-space: pre-wrap;">Configure Alertmanager in Prometheus Operator</span></figcaption></figure><ul><li>Now, it&apos;s time to visualize our metrics on a beautiful dashboard. Thankfully, the <code>kube-prometheus-stack</code> Helm chart automatically installs Grafana, so we don&apos;t need to install it separately. Access the Grafana UI at <code>http://localhost:8000</code>:</li></ul><pre><code class="language-bash">kubectl port-forward -n monitoring service/monitoring-grafana 8000:80
</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/22.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2559" height="1351"><figcaption><span style="white-space: pre-wrap;">Grafana in Prometheus operator</span></figcaption></figure><ul><li>You will see many pre-built dashboards. You can utilize them for monitoring or design/import your own.</li><li>Import the dashboard I created for the Node.js app, available in the <code>grafana-dashboard</code> directory.</li><li>Click on the <code>New</code> button at the top right, select <code>Import</code> from the drop-down menu, and import the dashboard.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/23.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1142" height="719"><figcaption><span style="white-space: pre-wrap;">Grafana in Prometheus Operator</span></figcaption></figure><ul><li>Once imported, you will see a screen similar to mine, as shown below, if you haven&apos;t stopped the <code>test.sh</code> (load generator script).</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/24.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2559" height="1428"><figcaption><span style="white-space: pre-wrap;">Grafana dashboard in Prometheus Operator</span></figcaption></figure><ul><li>This is how we can monitor our application, other components, and clusters from Grafana. </li></ul><h3 id="%E2%9A%92%EF%B8%8F-install-configure-loki">&#x2692;&#xFE0F; Install &amp; configure Loki </h3><ul><li>We&apos;ve set up monitoring, now let&apos;s configure Loki and Promtail for logging.</li><li>We already added the Grafana Helm repo in the previous step, which includes both Loki and Promtail.</li><li>We want Loki to store logs in an AWS S3 bucket, so it needs a bucket and relevant permissions to send logs to the AWS S3 bucket.</li><li>Head over to the AWS S3 console and create a bucket with a unique name. </li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/25.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2533" height="493"><figcaption><span style="white-space: pre-wrap;">ConfigureLoki to send logs to AWS S3 Bucket</span></figcaption></figure><ul><li>Next, create an IAM policy in the AWS console. You can find the policy in <code>loki-promtail-stack/aws-s3-policy.json</code>, but remember to add your bucket&apos;s ARN.</li></ul><pre><code class="language-js">{
    &quot;Version&quot;: &quot;2012-10-17&quot;,
    &quot;Statement&quot;: [
        {
            &quot;Sid&quot;: &quot;Stmt1719324853777&quot;,
            &quot;Action&quot;: [
                &quot;s3:ListBucket&quot;,
                &quot;s3:GetBucketLocation&quot;
            ],
            &quot;Effect&quot;: &quot;Allow&quot;,
            &quot;Resource&quot;: &quot;ARN_OF_YOUR_BUCKET&quot;
        },
        {
            &quot;Sid&quot;: &quot;Stmt1719324853778&quot;,
            &quot;Action&quot;: &quot;s3:*&quot;,
            &quot;Effect&quot;: &quot;Allow&quot;,
            &quot;Resource&quot;: &quot;ARN_OF_YOUR_BUCKET/*&quot;
        }
    ]
}

</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/26.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2527" height="952"><figcaption><span style="white-space: pre-wrap;">Configure Loki to send logs to AWS S3 Buckets</span></figcaption></figure><ul><li>Lets, ceate an IAM role, attach the policy, and create an <code>access_key_id</code> and <code>secret_access_key</code>.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/27.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2509" height="1140"><figcaption><span style="white-space: pre-wrap;">Configure Loki to send logs to the AWS S3 bucket</span></figcaption></figure><ul><li>Now we are ready to configure Loki.</li><li>Let&apos;s first see the values.yml file and write that into <code>loki_distributed_values.yml</code></li></ul><pre><code class="language-bash">helm show values grafana/loki-distributed &gt; loki-promtail-stack/loki_distributed_values.yml
</code></pre>
<ul><li><code>loki_distributed_values.yml</code> has all the default settings but we have to make some changes to configure the aws s3 bucket. </li><li>for reference, you can see the below screenshots for what values I&apos;ve changed in the file. </li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/28.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1973" height="201"><figcaption><span style="white-space: pre-wrap;">Configure Loki to send logs to AWS S3 bucket</span></figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/29-1.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2443" height="1092"><figcaption><span style="white-space: pre-wrap;">Configure Loki to send logs to the AWS S3 bucket</span></figcaption></figure><ul><li>I also created an updated configuration file, <code>loki-promtail-stack/custom_loki_distributed_values.yml</code>, with all necessary changes.</li><li>Ensure you add your bucket name, region, access ID, and secret access ID.</li><li>Now, let&apos;s install Loki on the cluster using the helm chart, hit the below command to install it</li></ul><pre><code class="language-bash">helm install loki grafana/loki-distributed -n monitoring -f loki-promtail-stack/custom_loki_distributed_values.yml
</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/31.0.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1666" height="622"><figcaption><span style="white-space: pre-wrap;">Install Loki using helm chart on AWS EKS cluster</span></figcaption></figure><ul><li>Yup! We&apos;ve installed Loki successfully </li></ul><h3 id="%E2%9A%92%EF%B8%8F-install-configure-promtail">&#x2692;&#xFE0F; Install &amp; configure Promtail</h3><ul><li>Now, let&apos;s set up the log collector, Promtail. We already have the Promtail Helm chart in the Grafana repo.</li><li>Since everything is installed in the <code>monitoring</code> namespace, we need to change one endpoint in Promtail&apos;s default configuration.</li><li>Hit the below command to see the default configuration(values.yml) file at <code>loki-promtail-stack/promtail_values.yml</code> </li></ul><pre><code class="language-bash">helm show values grafana/promtail &gt; loki-promtail-stack/promtail_values.yml
</code></pre>
<ul><li>We have to change <code>clients.url</code> attribute so Promtail knows where to send the logs. Refer to the image for reference.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/32.0.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2554" height="294"><figcaption><span style="white-space: pre-wrap;">Configure Promtail on AWS EKS Cluster</span></figcaption></figure><ul><li>I also provided an updated configuration file, <code>loki-promtail-stack/custom_promtail_values.yml</code>.</li><li>Now, we are done with configuration. let&apos;s go ahead and install Promtail. please hit the below command</li></ul><pre><code class="language-bash">helm install promtail grafana/promtail -n monitoring -f  loki-promtail-stack/custom_promtail_values.yml
</code></pre>
<figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/31.1.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1666" height="648"><figcaption><span style="white-space: pre-wrap;">Install and configure Promtail on AWS EKS</span></figcaption></figure><ul><li>Now, let&apos;s go ahead and see our logs in the Grafana dashboard. please hit the below command &amp; access grafana at <code>http://localhost:8000</code></li><li>Before adding a new dashboard, we need to add new data sources so Grafana can query logs from Loki.</li><li>So let&apos;s add a new data source. see below image for reference </li><li>Add a new data source with the URL <code>http://loki-loki-distributed-gateway.monitoring.svc.cluster.local</code></li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/32.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2357" height="760"><figcaption><span style="white-space: pre-wrap;">Configure Loki and Promtail on AWS EKS Cluster</span></figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/33.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1030" height="519"><figcaption><span style="white-space: pre-wrap;">Configure Loki and Promtail on AWS EKS Cluster</span></figcaption></figure><ul><li>We&apos;ve successfully added a data source. Now, import the community dashboard by typing <code>15414</code> and selecting Loki as the data source.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/34.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1238" height="838"><figcaption><span style="white-space: pre-wrap;">Configure Loki and Promtail on AWS EKS Cluster</span></figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/35.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1135" height="869"><figcaption><span style="white-space: pre-wrap;">Configure Loki and Promtail on AWS EKS Cluster</span></figcaption></figure><ul><li>You can now see all the logs in Grafana. Apply filters to get specific namespace or container logs.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/36.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2547" height="841"><figcaption><span style="white-space: pre-wrap;">Configure Loki and Promtail on AWS EKS Cluster</span></figcaption></figure><ul><li>Now, let&apos;s try to generate logs from your application by selecting the <code>default</code> namespace from the dropdown menu at the top.</li><li>You can run that<code>test.sh</code> script or visit <code>http://YOUR_LOAD_BALANCER_DNS_NAME/logs</code> in the browser.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/37-1.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="2559" height="841"><figcaption><span style="white-space: pre-wrap;">Configure Loki and Promtail on AWS EKS Cluster</span></figcaption></figure><ul><li>Lastly, verify that Loki is sending logs to the S3 bucket by checking the folders created by Loki in the AWS S3 console.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/07/38.png" class="kg-image" alt="Ultimate Guide to Monitoring &amp; Logging on AWS EKS: Prometheus, Grafana, Loki, and Promtail" loading="lazy" width="1501" height="546"><figcaption><span style="white-space: pre-wrap;">Configure Loki and Promtail on AWS EKS Cluster</span></figcaption></figure><ul><li>Yes, we can see the logs are available inside our AWS S3 bucket. </li></ul><h2 id="%F0%9F%A7%BC-cleanup">&#xA0;&#x1F9FC; Cleanup</h2><ul><li>It&apos;s time to clean up what we&apos;ve created to avoid unnecessary costs.</li><li>First, delete the Node.js application from Kubernetes:</li></ul><pre><code class="language-bash">kubectl delete -k app-k8s-manifest/
</code></pre>
<ul><li>Next, delete the Helm charts installed, as Prometheus, Grafana, and Alertmanager have created AWS EBS volumes:</li></ul><pre><code class="language-bash">helm uninstall monitoring -n monitoring

helm uninstall loki -n monitoring

helm uninstall promtail -n monitoring
</code></pre>
<ul><li>Let&apos;s even delete the monitoring namespace</li></ul><pre><code class="language-bash">kubectl delete ns monitoring
</code></pre>
<ul><li>Also, make sure that we don&apos;t have any Persistent volume because if something is left out it will create trouble for Terraform.</li></ul><pre><code class="language-bash">kubectl get pv
</code></pre>
<ul><li>Finally, let&apos;s destroy our AWS EKS Cluster. so, please navigate to the <code>eks-terraform/main/</code> directory &amp; hit the below command </li></ul><pre><code class="language-bash">cd eks-terraform/main/

terraform destroy --auto-approve
</code></pre>
<ul><li>After executing the above command, you will not have any resources in your AWS account.</li></ul><h2 id="%F0%9F%99%8C-conclusion">&#x1F64C; Conclusion</h2><ul><li>In this blog, we&apos;ve comprehensively walked through setting up a monitoring and logging stack on AWS EKS using Prometheus, Grafana, Loki, and Promtail.</li><li>From deploying a Node.js application with custom metrics to visualizing logs and metrics in Grafana, we&apos;ve covered the entire process step-by-step.</li><li>I aimed to cover all necessary details and best practices. but writing everything in the blog is not possible so I recommend you to dig deeper and check out my Terraform code, Kubernetes manifest files, and the rest of all directories.</li><li>You can implement CICD for Terraform (GitOps approach)</li></ul><p>And here it ends... &#x1F64C;&#x1F942;<br><br>if you like my work please message me on LinkedIn with&#xA0;<strong><em>&quot;Hi and your country name&quot;</em></strong></p><p>-&#x1F64B;&#x200D;&#x2642;&#xFE0F; Ankit Jodhani.</p><p>&#x1F4E8; reach me at&#xA0;<strong>ankitjodhani1903@gmail.com</strong></p><div class="kg-card kg-button-card kg-align-center"><a href="https://github.com/AnkitJodhani/eks-private-container-registry.git?ref=blog.ankitjodhani.com" class="kg-btn kg-btn-accent">&#x1F446; GitHub Repository</a></div>
<!--kg-card-begin: html-->
<!DOCTYPE html>
<html lang="en">
<head>
<link href="https://unpkg.com/boxicons@2.1.4/css/boxicons.min.css" rel="stylesheet">
</head>
  <style>
 .social-box {
    display: flex;
    padding: 0px 100px;
    justify-content: space-between;
}
    
 .social-box a {
      font-size: 100px;
      text-decoration: none;
   
    }   
</style>
<body>
  <div class="social-box">
       <a class="social-links-ankit" href="https://www.linkedin.com/in/ankit-jodhani/?ref=blog.ankitjodhani.com"><i class="bx bxl-linkedin-square"> </i> </a>
    
           <a class="social-links-ankit" href="https://twitter.com/Ankit__Jodhani?ref=blog.ankitjodhani.com"><i class="bx bxl-twitter"></i> </a>
    
           <a class="social-links-ankit" href="https://github.com/AnkitJodhani?ref=blog.ankitjodhani.com"><i class="bx bxl-github"></i> </a>
  </div>
</body>
</html>

<!--kg-card-end: html-->
<h2 id="%F0%9F%8E%92-resources">&#x1F392; Resources</h2><p><a href="https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack?ref=blog.ankitjodhani.com">https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack</a></p><p><a href="https://github.com/grafana/helm-charts/tree/main/charts/loki-distributed?ref=blog.ankitjodhani.com">https://github.com/grafana/helm-charts/tree/main/charts/loki-distributed</a></p><p><a href="https://github.com/grafana/helm-charts/tree/main/charts/promtail?ref=blog.ankitjodhani.com">https://github.com/grafana/helm-charts/tree/main/charts/promtail</a></p><p><a href="https://dev.to/aws-builders/monitoring-eks-cluster-with-prometheus-and-grafana-1kpb?ref=blog.ankitjodhani.com">https://dev.to/aws-builders/monitoring-eks-cluster-with-prometheus-and-grafana-1kpb</a></p><p><a href="https://github.com/grafana/loki/issues/7335?ref=blog.ankitjodhani.com">https://github.com/grafana/loki/issues/7335</a></p><p><a href="https://stackoverflow.com/questions/76873980/loki-s3-configuration-for-chunks-and-indexes?ref=blog.ankitjodhani.com">https://stackoverflow.com/questions/76873980/loki-s3-configuration-for-chunks-and-indexes</a></p><p><a href="https://blog.srev.in/posts/grafana-loki-with-amazon-s3/?ref=blog.ankitjodhani.com">https://blog.srev.in/posts/grafana-loki-with-amazon-s3/</a></p><p><a href="https://akyriako.medium.com/kubernetes-logging-with-grafana-loki-promtail-in-under-10-minutes-d2847d526f9e?ref=blog.ankitjodhani.com">https://akyriako.medium.com/kubernetes-logging-with-grafana-loki-promtail-in-under-10-minutes-d2847d526f9e</a></p>
<!--kg-card-begin: html-->
<div class="toc"></div>

<!--kg-card-end: html-->
]]></content:encoded></item><item><title><![CDATA[Deploy a Private Docker Container Registry on Kubernetes (EKS)]]></title><description><![CDATA[Learn how to deploy a private Docker container registry on Kubernetes (EKS) using Terraform. This guide covers setting up an EKS cluster, deploying Kubernetes components, and configuring persistent storage, ensuring a scalable and secure setup. Ideal for those preparing for the CKA exam.]]></description><link>https://blog.ankitjodhani.com/deploy-a-private-docker-container-registry-on-kubernetes-eks/</link><guid isPermaLink="false">66674f56c38c8136065d7191</guid><category><![CDATA[EKS]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Terraform]]></category><category><![CDATA[Docker]]></category><dc:creator><![CDATA[Ankit Jodhani]]></dc:creator><pubDate>Fri, 07 Jun 2024 10:16:00 GMT</pubDate><media:content url="https://cdn.ankitjodhani.com/2024/06/architecture.gif" medium="image"/><content:encoded><![CDATA[<h2 id="%F0%9F%99%8B%E2%80%8D%E2%99%82%EF%B8%8F-introduction">&#x1F64B;&#x200D;&#x2642;&#xFE0F; Introduction</h2><img src="https://cdn.ankitjodhani.com/2024/06/architecture.gif" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)"><p>Hi Everyone, I&apos;m <a href="https://www.linkedin.com/in/ankit-jodhani/?ref=blog.ankitjodhani.com" rel="noreferrer">Ankit Jodhani</a>, a freelance DevOps engineer, and I love sharing my knowledge publicly. This blog is part of the #10WeeksofCloudOps series initiated by <a href="https://www.linkedin.com/in/piyush-sachdeva/?ref=blog.ankitjodhani.com" rel="noreferrer">Piyush Sachdeva</a>. I want to thank Piyush Sachdeva for providing valuable guidance through the journey.</p><h2 id="%F0%9F%93%9A-synopsis">&#x1F4DA; Synopsis</h2><p>In this blog, we will deploy a private container registry on Kubernetes, specifically on AWS EKS. This project will be very useful if you are preparing for the Certified Kubernetes Administrator (CKA) exam. Our container repository will function similarly to Docker Hub, with an attached domain name, allowing you to push and pull images as you do with Docker Hub. We will also ensure our registry has a persistent volume to avoid losing container images in case of POD or Cluster failure.</p><h3 id="%F0%9F%94%B8-story">&#x1F538; Story</h3><ul><li>First, we will create the AWS EKS Cluster and all related components, such as IRSA (IAM Role for Service Account), and deploy a Helm chart using Terraform.</li><li>Once we have the infrastructure ready, we will deploy all the necessary Kubernetes components like </li><li>&#x2693; Kubernetes Components or Objects<ul><li>&#x1F680; Deployment</li><li>&#x1F6CE;&#xFE0F; Services</li><li>&#x1F510; ConfigMap &amp; Secret </li><li>&#x2699;&#xFE0F; Ingress</li><li>&#x1F6A8; Network Policy </li><li>&#x1F4C1; Persistent Volume </li><li>&#x1F4C2; Persistent Volume Claim </li><li>&#x1F5C3;&#xFE0F; Storage Class</li><li>&#x1F6A7; Namespace</li></ul></li><li>After deploying the components, access the private docker container registry using the domain name attached to it.</li></ul><h2 id="%E2%9C%85-prerequisites">&#x2705; Prerequisites</h2><ul><li>&#x1F4CC; AWS Account</li><li>&#x1F4CC; Hosted zone in Route 53 (Domain name)</li><li>&#x1F4CC; Basic knowledge of Terraform</li><li>&#x1F4CC; Basic knowledge of Docker</li></ul><h2 id="%F0%9F%96%A5%EF%B8%8F-local-setup">&#x1F5A5;&#xFE0F; Local setup</h2><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">Ensure Terraform and AWS CLI are installed and configured with administrative privileges to avoid permission issues. </div></div><h2 id="%F0%9F%93%A6-list-of-aws-services">&#x1F4E6;  List of AWS services</h2><ul><li>&#x1F451; Amazon EKS </li><li>&#x1F310; Amazon VPC</li><li>&#x1F512; Amazon  IAM</li><li>&#x1F4BB; Amazon EC2</li><li>&#x2696;&#xFE0F; Amazon Autoscaling </li><li>&#x1FAA3;Amazon S3</li><li>&#x1F680; DynamoDB</li></ul><h2 id="%E2%98%B8%EF%B8%8F-list-of-kubernetes-tools-drivers">&#x2638;&#xFE0F; List of Kubernetes Tools &amp; Drivers</h2><ul><li>&#x1F3CB; AWS Load Balancer Controller</li><li>&#x1F310; ExternalDNS</li><li>&#x1F4C2; EFS CSI Driver (EKS Addon)</li></ul><h2 id="%F0%9F%8E%AF-architecture">&#x1F3AF; Architecture</h2><p>It&apos;s time to understand the architecture of the project. Understanding the architecture of the project simplifies the process and boosts confidence in following the steps.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://cdn.ankitjodhani.com/2024/06/architecture-1.gif" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="1280" height="1664"><figcaption><span style="white-space: pre-wrap;">Deploy a Private Docker Container Registry on Kubernetes (EKS)</span></figcaption></figure><h2 id="%F0%9F%9A%80-step-by-step-guide">&#x1F680; Step-by-Step Guide</h2><ul><li>Please clone the<a href="https://github.com/AnkitJodhani/eks-private-container-registry.git?ref=blog.ankitjodhani.com" rel="noreferrer"> Github repository</a> on your local computer.  </li></ul><pre><code class="language-bash">git clone https://github.com/AnkitJodhani/eks-private-container-registry.git

cd eks-private-container-registry
</code></pre>
<ul><li>You will find two directories inside the directory.</li></ul><pre><code class="language-md">&#x1F4C2;eks-private-container-registry
&#x251C;&#x2500;&#x2500;&#x1F4C1;kubernetes
&#x2502;   &#x2514;&#x2500;&#x2500; It has all the Kubernetes manifest
&#x251C;&#x2500;&#x2500;&#x1F4C1; terraform
&#x2502;   &#x2514;&#x2500;&#x2500; It has all the Terraform script
&#x2514;&#x2500;&#x2500;&#x1F63A;.gitignore
</code></pre>
<ul><li>First, we will create the whole infrastructure using Terraform. so please navigate to the <code>terraform/main</code> directory </li></ul><pre><code class="language-sh">cd eks-private-container-registry/terraform/main
</code></pre>
<ul><li>Here you will find all the configuration files like <code>backend.tf</code> <code>providers.tf</code> and <code>terraform.tfvars</code>.  You can customize them as needed, but the default settings work fine for this project.</li><li>Now, lets initlize the terraform</li></ul><pre><code class="language-sh">terraform init
</code></pre>
<ul><li>validate the script</li></ul><pre><code class="language-sh">terraform validate
</code></pre>
<ul><li>let&apos;s see the plan of what Terraform going to install for us</li></ul><pre><code class="language-sh">terraform plan
</code></pre>
<ul><ul><li>VPC, IAM Roles, EKS Cluster + Managed NodeGroup, EFS CSI driver (using AWS Addon) + IRSA (IAM role for service account), AWS Load Balancer Controller (using Helm Chart)+IRSA (IAM role for service account), ExternalDNS(using Helm Chart) + IRSA (IAM role for service account)</li></ul></ul><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">ExternalDNS will automatically set up records in AWS Route53. but for that, we should have hosted a zone in Route53</div></div><ul><li>It&apos;s time to trigger terraform and wait for the infrastructure to come up.  </li></ul><pre><code class="language-sh">terraform apply --auto-approve
</code></pre>
<div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x26A0;&#xFE0F;</div><div class="kg-callout-text">Terraform takes approximately 20-30 minutes. So Enjoy the automation &#x1F600; &#x2026;.</div></div><ul><li>Once, the above command executes successfully, we can go ahead and visit the AWS console to verify everything.</li></ul><figure class="kg-card kg-gallery-card kg-width-wide"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://cdn.ankitjodhani.com/2024/06/2.png" width="2125" height="1454" loading="lazy" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)"></div><div class="kg-gallery-image"><img src="https://cdn.ankitjodhani.com/2024/06/3.png" width="2559" height="454" loading="lazy" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)"></div></div><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://cdn.ankitjodhani.com/2024/06/4.png" width="2361" height="1086" loading="lazy" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)"></div><div class="kg-gallery-image"><img src="https://cdn.ankitjodhani.com/2024/06/5.png" width="2555" height="360" loading="lazy" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)"></div></div></div></figure><ul><li>Let&apos;s review all the components in Kubernetes.</li></ul><pre><code class="language-sh">aws list-clusters --region us-east-1

aws update-kubeconfig --name private-registry-eks-cluster --region us-east-1

kubectl get pods -n kube-system
</code></pre>
<figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/6.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2254" height="1181"></figure><ul><li>Yeah!! Our infrastructure is up &amp; running.. now let&apos;s apply our Kubernetes manifest files but before we do that, we have to create a File System inside AWS EFS  where registry pods will store our container images to persist data in case of pod or cluster failure.</li><li>Navigate to AWS EFS and create a file system. You can name it anything, but ensure you select the VPC where we created our infrastructure. Refer to the screenshot below for guidance.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/7-1.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="808" height="584"></figure><ul><li>Once the file system is created, make a note of the file system ID, as we need to add it to the Kubernetes manifest <code>storageclass.yml</code> file located in the Kubernetes directory. Add the file system ID as shown in the image below.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/8.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="401"></figure><ul><li>You can find <code>storageclass.yml</code> file inside the Kubernetes directory. and add the File-system ID as shown in the below image.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/9.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="806"></figure><ul><li>We need to modify the file system&apos;s security group. Go to the AWS EFS console and navigate to the network section.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/10.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2211" height="945"></figure><ul><li>Add the security group of the node so that our worker node can communicate with the file system. Without adding the security group, our worker node can&apos;t store images in the file system, and Kubernetes will not be able to bind the volume. Refer to the screenshot below for guidance.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/11.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2206" height="914"></figure><ul><li>You may want to change the <code>username</code> and <code>password</code> of your private docker container registry. you can do that by editing <code>kubernetes/registry-secret.yml</code> file. but your username and password must be in base64 format. </li><li>To convert normal text to base64 visit <a href="https://www.base64encode.org/?ref=blog.ankitjodhani.com" rel="noreferrer">this site</a></li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/12.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="812"></figure><ul><li>As I discussed earlier, we should have a hosted zone inside AWS Route53 so that ExternalDNS will automatically insert or update records that point to the application load balancer. </li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/13.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2493" height="398"></figure><ul><li>You have to change the domain name inside <code>kubernetes/albingress.yml</code> fiile as per your domain name. </li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/14.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="810"></figure><ul><li>With that, we are done with all configurations. now let&apos;s apply all Kubernetes manifest files. Execute the below command in your terminal</li></ul><pre><code class="language-sh">cd eks-private-container-registry

kubectl apply -k kubernetes/
</code></pre>
<figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/15.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2111" height="771"></figure><ul><li>let&apos;s verify all the components that have been deployed </li></ul><pre><code class="language-sh">kubectl get pv

kubectl get pvc -n dev

kubectl get pods -n dev

kubectl get cm -n dev

kubectl get secret -n dev

kubectl get networkpolicy -n dev
</code></pre>
<figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/16.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2317" height="1122"></figure><ul><li>We can also verify the Load Balancer created by the ALB Load Balancer Controller by going to the AWS console.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/17.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2555" height="1208"></figure><ul><li>We can also see  the inserted records inside our hosted zone in route 53</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/18.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="692"></figure><ul><li>With that, our private Docker container registry is successfully set up. It&apos;s time to test it.</li></ul><h2 id="%F0%9F%A7%AA-testing">&#x1F9EA; Testing</h2><ul><li>Please type your registry domain name inside your favorite browser to see the User interface. In my case it is <code>images.ankit.study</code>.  and enter your <code>username</code> and <code>password</code> that you&apos;ve configured. </li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/19.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="382"></figure><ul><li>Now, let&apos;s try to push some docker container images to our private registry. you can refer to below commands but you have to change domain name to your domain name.</li></ul><pre><code class="language-sh"># List all the images
docker images

# Rull ubuntu images from docker hub 
docker pull ubuntu

# Rename images --------&lt;&lt;YOUR_DOMAIN_NAME&gt;&gt;
docker tag ubuntu:latest images.ankit.study/ubuntu:latest

# Login to our registry 
# docker login YOUR_DOMAIN_NAME -u USERNAME -p PASSWORD
docker login images.ankit.study -u ankit -p jodhani

# Push docker image
docker push images.ankit.study/ubuntu:latest
</code></pre>
<figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/20.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2065" height="769"></figure><ul><li>Let&apos;s review the recently pushed image by going to the browser.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/21.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="432"></figure><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/22.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="586"></figure><ul><li>Similarly, we can push multiple images to our registry. Using EFS as a persistent volume eliminates concerns about storage size because EFS is serverless and elastic.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/23.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2080" height="813"></figure><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/24.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="555"></figure><ul><li>We can see the occupied storage by going to the AWS EFS Console &amp;  you can also monitor that by AWS CloudWatch.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/25.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2247" height="1157"></figure><ul><li>We are using persistent data storage so let&apos;s put that to test by deleting all the Registry &amp; UI pods from our cluster. </li></ul><pre><code class="language-sh"># List of Pods 
kubectl get pods -n dev

# Delete registry deployment
kubectl delete -f kubernetes/registry-deployment.yml

# Delete ui deployment
kubectl delete -f kubernetes/ui-deployment.yml

# List of Pods 
kubectl get pods -n dev
</code></pre>
<figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/26.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2040" height="842"></figure><ul><li>Now, Let&apos;s create these pods again. </li></ul><pre><code class="language-sh">kubectl apply -k kubernetes/
</code></pre>
<figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/27.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2104" height="830"></figure><p>Once, all pods come running state, we can verify the images by visiting the domain name or browser. </p><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/28.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="496"></figure><ul><li>Yeah!! We can see that our images are there. it means that we were able to persist data. </li></ul><h2 id="%F0%9F%A7%B9-cleanup">&#x1F9F9; Cleanup</h2><ul><li>Let&apos;s destroy the infrastructure to avoid unnecessary changes. </li><li>First, Remove Kubernetes components.</li></ul><pre><code class="language-sh">kubectl delete -k kubernetes/
</code></pre>
<figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/29.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2115" height="621"></figure><ul><li>Second, Delete the file system that we&apos;ve created via the AWS console.</li></ul><figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/30.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="2559" height="437"></figure><ul><li>Third, Destroy the infrastructure.</li></ul><pre><code class="language-sh">terraform destroy --auto-approve
</code></pre>
<figure class="kg-card kg-image-card"><img src="https://cdn.ankitjodhani.com/2024/06/31.png" class="kg-image" alt="Deploy a Private Docker Container Registry on Kubernetes (EKS)" loading="lazy" width="1507" height="361"></figure><ul><li>I recommend you visit the AWS Console &amp; verify everything to avoid unnecessary charges.</li></ul><h2 id="%F0%9F%99%8C-conclusion">&#x1F64C; Conclusion</h2><ul><li>Finally, in this blog, we explored deploying our own private container registry on the AWS EKS Cluster. We started by provisioning the Amazon EKS cluster using Terraform, ensuring a solid foundation for our deployment</li><li>I aimed to cover all necessary details and best practices. but writing everything in the blog is not possible so I recommend you to dig deeper and check out my Terraform code and Kubernetes manifest files.  </li><li>You can implement CICD for Terraform (GitOps approach) </li><li>Use AWS Secret Manager or other tools to store your secrets</li></ul><p>And here it ends... &#x1F64C;&#x1F942;<br><br>if you like my work please message me on LinkedIn with <strong><em>&quot;Hi and your country name&quot; </em></strong></p><p>-&#x1F64B;&#x200D;&#x2642;&#xFE0F; Ankit Jodhani.</p><p>&#x1F4E8; reach me at <strong>ankitjodhani1903@gmail.com </strong></p><div class="kg-card kg-button-card kg-align-center"><a href="https://github.com/AnkitJodhani/eks-private-container-registry.git?ref=blog.ankitjodhani.com" class="kg-btn kg-btn-accent">&#x1F446; GitHub Repository</a></div>
<!--kg-card-begin: html-->
<!DOCTYPE html>
<html lang="en">
<head>
<link href="https://unpkg.com/boxicons@2.1.4/css/boxicons.min.css" rel="stylesheet">
</head>
  <style>
 .social-box {
    display: flex;
    padding: 0px 100px;
    justify-content: space-between;
}
    
 .social-box a {
      font-size: 100px;
      text-decoration: none;
   
    }   
</style>
<body>
  <div class="social-box">
       <a class="social-links-ankit" href="https://www.linkedin.com/in/ankit-jodhani/?ref=blog.ankitjodhani.com"><i class="bx bxl-linkedin-square"> </i> </a>
    
           <a class="social-links-ankit" href="https://twitter.com/Ankit__Jodhani?ref=blog.ankitjodhani.com"><i class="bx bxl-twitter"></i> </a>
    
           <a class="social-links-ankit" href="https://github.com/AnkitJodhani?ref=blog.ankitjodhani.com"><i class="bx bxl-github"></i> </a>
  </div>
</body>
</html>

<!--kg-card-end: html-->
<h2 id="%F0%9F%8E%92-resources">&#x1F392; Resources</h2><p><a href="https://joxit.dev/docker-registry-ui/?ref=blog.ankitjodhani.com">https://joxit.dev/docker-registry-ui/</a></p><p><a href="https://hub.docker.com/r/joxit/docker-registry-ui?ref=blog.ankitjodhani.com">https://hub.docker.com/r/joxit/docker-registry-ui</a><br><br><a href="https://medium.com/clarusway/creating-a-private-container-registry-repository-and-web-service-8c753b54f55c?ref=blog.ankitjodhani.com">https://medium.com/clarusway/creating-a-private-container-registry-repository-and-web-service-8c753b54f55c</a></p>
<!--kg-card-begin: html-->
<div class="toc"></div>

<!--kg-card-end: html-->
]]></content:encoded></item></channel></rss>