mirror of
https://github.com/lupyuen/lupyuen.github.io.git
synced 2025-01-13 03:18:31 +08:00
1118 lines
No EOL
74 KiB
HTML
1118 lines
No EOL
74 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||
<meta name="generator" content="rustdoc">
|
||
<title>Optimising the Continuous Integration for Apache NuttX RTOS (GitHub Actions)</title>
|
||
|
||
|
||
<!-- Begin scripts/articles/*-header.html: Article Header for Custom Markdown files processed by rustdoc, like chip8.md -->
|
||
<meta property="og:title"
|
||
content="Optimising the Continuous Integration for Apache NuttX RTOS (GitHub Actions)"
|
||
data-rh="true">
|
||
<meta property="og:description"
|
||
content="Within Two Weeks: We squashed our GitHub Actions spending from $4,900 (weekly) down to $890. Previously: Our developers waited 2.5 Hours for a Pull Request to be checked. Now we wait at most 1.5 Hours! This article explains everything we did in the (Semi-Chaotic) Two Weeks for Apache NuttX RTOS."
|
||
data-rh="true">
|
||
<meta name="description"
|
||
content="Within Two Weeks: We squashed our GitHub Actions spending from $4,900 (weekly) down to $890. Previously: Our developers waited 2.5 Hours for a Pull Request to be checked. Now we wait at most 1.5 Hours! This article explains everything we did in the (Semi-Chaotic) Two Weeks for Apache NuttX RTOS.">
|
||
<meta property="og:image"
|
||
content="https://lupyuen.github.io/images/ci3-title.jpg">
|
||
<meta property="og:type"
|
||
content="article" data-rh="true">
|
||
<link rel="canonical"
|
||
href="https://lupyuen.org/articles/ci3.html" />
|
||
<!-- End scripts/articles/*-header.html -->
|
||
<!-- Begin scripts/rustdoc-header.html: Header for Custom Markdown files processed by rustdoc, like chip8.md -->
|
||
<link rel="alternate" type="application/rss+xml" title="RSS Feed for lupyuen" href="/rss.xml" />
|
||
<link rel="stylesheet" type="text/css" href="../normalize.css">
|
||
<link rel="stylesheet" type="text/css" href="../rustdoc.css" id="mainThemeStyle">
|
||
<link rel="stylesheet" type="text/css" href="../dark.css">
|
||
<link rel="stylesheet" type="text/css" href="../light.css" id="themeStyle">
|
||
<link rel="stylesheet" type="text/css" href="../prism.css">
|
||
<script src="../storage.js"></script><noscript>
|
||
<link rel="stylesheet" href="../noscript.css"></noscript>
|
||
<link rel="shortcut icon" href="../favicon.ico">
|
||
<style type="text/css">
|
||
#crate-search {
|
||
background-image: url("../down-arrow.svg");
|
||
}
|
||
</style>
|
||
<!-- End scripts/rustdoc-header.html -->
|
||
|
||
|
||
</head>
|
||
<body class="rustdoc">
|
||
<!--[if lte IE 8]>
|
||
<div class="warning">
|
||
This old browser is unsupported and will most likely display funky
|
||
things.
|
||
</div>
|
||
<![endif]-->
|
||
|
||
|
||
<!-- Begin scripts/rustdoc-before.html: Pre-HTML for Custom Markdown files processed by rustdoc, like chip8.md -->
|
||
|
||
<!-- Begin Theme Picker -->
|
||
<div class="theme-picker" style="left: 0"><button id="theme-picker" aria-label="Pick another theme!"><img src="../brush.svg"
|
||
width="18" alt="Pick another theme!"></button>
|
||
<div id="theme-choices"></div>
|
||
</div>
|
||
<!-- Theme Picker -->
|
||
|
||
<!-- End scripts/rustdoc-before.html -->
|
||
|
||
|
||
<h1 class="title">Optimising the Continuous Integration for Apache NuttX RTOS (GitHub Actions)</h1>
|
||
<nav id="rustdoc"><ul>
|
||
<li><a href="#rescue-plan" title="Rescue Plan">1 Rescue Plan</a><ul></ul></li>
|
||
<li><a href="#present-pains" title="Present Pains">2 Present Pains</a><ul></ul></li>
|
||
<li><a href="#disable-macos-and-windows-builds" title="Disable macOS and Windows Builds">3 Disable macOS and Windows Builds</a><ul></ul></li>
|
||
<li><a href="#move-the-merge-jobs" title="Move the Merge Jobs">4 Move the Merge Jobs</a><ul></ul></li>
|
||
<li><a href="#halve-the-ci-checks" title="Halve the CI Checks">5 Halve the CI Checks</a><ul></ul></li>
|
||
<li><a href="#live-metric-for-full-time-runners" title="Live Metric for Full-Time Runners">6 Live Metric for Full-Time Runners</a><ul></ul></li>
|
||
<li><a href="#monitor-our-ci-servers-24-x-7" title="Monitor our CI Servers 24 x 7">7 Monitor our CI Servers 24 x 7</a><ul></ul></li>
|
||
<li><a href="#final-verdict" title="Final Verdict">8 Final Verdict</a><ul></ul></li>
|
||
<li><a href="#our-wishlist" title="Our Wishlist">9 Our Wishlist</a><ul></ul></li>
|
||
<li><a href="#whats-next" title="What’s Next">10 What’s Next</a><ul></ul></li>
|
||
<li><a href="#appendix-self-hosted-github-runners" title="Appendix: Self-Hosted GitHub Runners">11 Appendix: Self-Hosted GitHub Runners</a><ul></ul></li>
|
||
<li><a href="#appendix-check-our-pr-submission" title="Appendix: Check our PR Submission">12 Appendix: Check our PR Submission</a><ul></ul></li>
|
||
<li><a href="#appendix-verify-our-pr-merge" title="Appendix: Verify our PR Merge">13 Appendix: Verify our PR Merge</a><ul></ul></li>
|
||
<li><a href="#appendix-network-timeout-at-github" title="Appendix: Network Timeout at GitHub">14 Appendix: Network Timeout at GitHub</a><ul></ul></li>
|
||
<li><a href="#appendix-build-rules-for-ci-workflow" title="Appendix: Build Rules for CI Workflow">15 Appendix: Build Rules for CI Workflow</a><ul>
|
||
<li><a href="#overall-solution" title="Overall Solution">15.1 Overall Solution</a><ul></ul></li>
|
||
<li><a href="#fetch-the-arch-labels" title="Fetch the Arch Labels">15.2 Fetch the Arch Labels</a><ul></ul></li>
|
||
<li><a href="#limit-to-simple-prs" title="Limit to Simple PRs">15.3 Limit to Simple PRs</a><ul></ul></li>
|
||
<li><a href="#identify-the-non-arm-builds" title="Identify the Non-Arm Builds">15.4 Identify the Non-Arm Builds</a><ul></ul></li>
|
||
<li><a href="#skip-the-non-arm-builds" title="Skip The Non-Arm Builds">15.5 Skip The Non-Arm Builds</a><ul></ul></li>
|
||
<li><a href="#same-for-other-builds" title="Same for Other Builds">15.6 Same for Other Builds</a><ul></ul></li>
|
||
<li><a href="#skip-the-macos-builds" title="Skip the macOS Builds">15.7 Skip the macOS Builds</a><ul></ul></li>
|
||
<li><a href="#ignore-the-docs-label" title="Ignore the Docs Label">15.8 Ignore the Docs Label</a><ul></ul></li>
|
||
<li><a href="#sync-to-nuttx-apps" title="Sync to NuttX Apps">15.9 Sync to NuttX Apps</a><ul></ul></li>
|
||
<li><a href="#actual-performance" title="Actual Performance">15.10 Actual Performance</a><ul></ul></li></ul></li></ul></nav><p>📝 <em>10 Nov 2024</em></p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-title.jpg" alt="Optimising the Continuous Integration for Apache NuttX RTOS" /></p>
|
||
<p><strong>Within Two Weeks:</strong> We squashed our GitHub Actions spending from <strong>$4,900</strong> (weekly) down to <strong>$890</strong>…</p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-beforeafter7days.jpg" alt="Within Two Weeks: We squashed our GitHub Actions spending from $4,900 (weekly) down to $890" /></p>
|
||
<p><strong>Previously:</strong> Our developers waited <strong>2.5 Hours</strong> for a Pull Request to be checked. Now we wait at most <strong>1.5 Hours</strong>! (Pic below)</p>
|
||
<p>This article explains everything we did in the (Semi-Chaotic) Two Weeks for <a href="https://nuttx.apache.org/docs/latest/index.html"><strong>Apache NuttX RTOS</strong></a>…</p>
|
||
<ul>
|
||
<li>
|
||
<p>Shut down the <strong>macOS and Windows Builds</strong>, revive them in a different form</p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Merge Jobs</strong> are super costly, we moved them to the NuttX Mirror Repo</p>
|
||
</li>
|
||
<li>
|
||
<p>We <strong>Halved the CI Checks</strong> for Complex PRs. (Continuous Integration)</p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Simple PRs</strong> are already quite fast. (Sometimes 12 Mins!)</p>
|
||
</li>
|
||
<li>
|
||
<p>Coding the <strong>Build Rules</strong> for our CI Workflow, monitoring our CI Servers 24 x 7</p>
|
||
</li>
|
||
<li>
|
||
<p>We can’t run <strong>All CI Checks</strong>, but NuttX Devs can help ourselves!</p>
|
||
</li>
|
||
</ul>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-beforeafter.jpg" alt="Previously: Our developers waited 2.5 Hours for a Pull Request to be checked. Now we wait at most 1.5 Hours" /></p>
|
||
<h1 id="rescue-plan"><a class="doc-anchor" href="#rescue-plan">§</a>1 Rescue Plan</h1>
|
||
<p>We had <a href="https://lists.apache.org/thread/2yzv1fdf9y6pdkg11j9b4b93grb2bn0q"><strong>an ultimatum</strong></a> to reduce (drastically) our usage of GitHub Actions. Or our Continuous Integration would <strong>Halt Totally in Two Weeks</strong>!</p>
|
||
<p>After <a href="https://www.strava.com/activities/12673094079"><strong>deliberating overnight:</strong></a> We swiftly activated <a href="https://github.com/apache/nuttx/issues/14376"><strong>our rescue plan</strong></a>…</p>
|
||
<ul>
|
||
<li>
|
||
<p><strong>Submit / Update a Complex PR:</strong></p>
|
||
<p>CI Workflow shall trigger only <strong>Half the Jobs</strong> for CI Checks.</p>
|
||
<p><em>(A <strong>Complex PR</strong> affects <strong>All Architectures</strong>: Arm32, Arm64, RISC-V, Xtensa, etc. Will reduce GitHub Cost by 32%)</em></p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Merge a Complex PR:</strong></p>
|
||
<p>CI Workflow shall <strong>Run All Jobs</strong> like before.</p>
|
||
<p><em>(arm-01 … arm-14, risc-v, xtensa, etc)</em></p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Simple PRs:</strong></p>
|
||
<p>No change. Thus Simple Arm32 PRs shall build only <em>arm-01 … arm-14.</em></p>
|
||
<p><em>(A <strong>Simple PR</strong> concerns only <strong>One Single Architecture</strong>: Arm32 OR Arm64 OR RISC-V etc)</em></p>
|
||
</li>
|
||
<li>
|
||
<p><strong>After Merging Any PR:</strong></p>
|
||
<p>Merge Jobs shall run at <a href="https://github.com/NuttX/nuttx/actions/workflows/build.yml"><strong>NuttX Mirror Repo</strong></a>.</p>
|
||
<p><em>(Instead of OG Repo <em>apache/nuttx</em>)</em></p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Two Scheduled Merge Jobs:</strong></p>
|
||
<p>Daily at <strong>00:00 UTC</strong> and <strong>12:00 UTC</strong>.</p>
|
||
<p><em>(No more On-Demand Merge Jobs)</em></p>
|
||
</li>
|
||
<li>
|
||
<p><strong>macOS and Windows Jobs:</strong></p>
|
||
<p>Shall be <strong>Totally Disabled</strong>.</p>
|
||
<p><em>(Until we find a way to manage their costs)</em></p>
|
||
</li>
|
||
</ul>
|
||
<p>We have reasons for doing these, backed by solid data…</p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-cancel.jpg" alt="We wasted GitHub Runners on Merge Jobs that were eventually superseded and cancelled" /></p>
|
||
<h1 id="present-pains"><a class="doc-anchor" href="#present-pains">§</a>2 Present Pains</h1>
|
||
<p>We studied the CI Jobs for the previous day…</p>
|
||
<ul>
|
||
<li><a href="https://docs.google.com/spreadsheets/d/1ujGKmUyy-cGY-l1pDBfle_Y6LKMsNp7o3rbfT1UkiZE/edit?gid=0#gid=0"><strong>Analysis of CI Jobs over 24 Hours</strong></a></li>
|
||
</ul>
|
||
<p>Many CI Jobs were <strong>Incomplete</strong>: We wasted GitHub Runners on Merge Jobs that were eventually <strong>superseded and cancelled</strong> (pic above, we’ll come back to this)</p>
|
||
<p><img src="https://github.com/user-attachments/assets/953e2ac7-aee5-45c6-986c-3bcdd97d0b5e" alt="Screenshot 2024-10-17 at 1 18 14 PM" /></p>
|
||
<p><strong>Scheduled Merge Jobs</strong> will reduce wastage of GitHub Runners, since most Merge Jobs didn’t complete. Only One Merge Job completed on that day…</p>
|
||
<p><img src="https://github.com/user-attachments/assets/1452067f-a151-4641-8d1e-3c84c0f45796" alt="Screenshot 2024-10-17 at 1 16 16 PM" /></p>
|
||
<p>When we <strong>Halve the CI Jobs:</strong> We reduce the wastage of GitHub Runners…</p>
|
||
<p><img src="https://github.com/user-attachments/assets/bda5c8c3-862a-41b6-bab3-20352ba9976a" alt="Screenshot 2024-10-17 at 1 15 30 PM" /></p>
|
||
<p>This analysis was super helpful for complying with the <a href="https://infra.apache.org/github-actions-policy.html"><strong>ASF Policy for GitHub Actions</strong></a>! Next we follow through…</p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-macos.jpg" alt="Disable macOS Builds" /></p>
|
||
<h1 id="disable-macos-and-windows-builds"><a class="doc-anchor" href="#disable-macos-and-windows-builds">§</a>3 Disable macOS and Windows Builds</h1>
|
||
<p><em>Quitting the macOS Builds? That’s horribly drastic!</em></p>
|
||
<p>Yeah sorry we can’t enable <strong>macOS Builds</strong> in NuttX Repo right now…</p>
|
||
<ul>
|
||
<li>
|
||
<p>macOS Runners <a href="https://docs.github.com/en/billing/managing-billing-for-your-products/managing-billing-for-github-actions/about-billing-for-github-actions#minute-multipliers"><strong>cost 10 times</strong></a> as much as Linux Runners.</p>
|
||
<p>To enable One macOS Job: We need to disable 10 Linux Jobs! Which is not feasible.</p>
|
||
</li>
|
||
<li>
|
||
<p>Our macOS Jobs are in an <strong>untidy state</strong> right now, showing many many warnings.</p>
|
||
<p>We need someone familiar with Intel Macs to clean up the macOS Jobs.</p>
|
||
<p>(See the <a href="https://github.com/NuttX/nuttx/actions/runs/11728929385/job/32673549658#step:7:4236"><strong>macOS Log</strong></a>)</p>
|
||
</li>
|
||
<li>
|
||
<p>That’s why we moved the macOS Builds to the <a href="https://github.com/NuttX/nuttx/actions/workflows/build.yml"><strong>NuttX Mirror Repo</strong></a>, which won’t be charged to NuttX Project.</p>
|
||
<p><a href="https://github.com/apache/nuttx/issues/14598">(Discussion here)</a></p>
|
||
<p><a href="https://github.com/apache/nuttx/issues/14526">(<strong>macOS Build Farm</strong> coming soon!)</a></p>
|
||
</li>
|
||
</ul>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-dashboard.png" alt="NuttX Dashboard" /></p>
|
||
<p><em>Can we still prevent breakage of ALL Builds? Linux, macOS AND Windows?</em></p>
|
||
<p>Nope this is <strong>simply impossible</strong>…</p>
|
||
<ul>
|
||
<li>
|
||
<p>In the good old days: We were using <strong>far too many</strong> GitHub Runners.</p>
|
||
<p>This is not sustainable, we don’t have the budget to do all the CI Checks we used to.</p>
|
||
</li>
|
||
<li>
|
||
<p>Hence we should expect <strong>some breakage</strong>.</p>
|
||
<p>We should be prepared to backtrack and figure out which PR broke the build.</p>
|
||
</li>
|
||
<li>
|
||
<p>That’s why we have tools like <a href="https://github.com/apache/nuttx/issues/14558"><strong>NuttX Dashboard</strong></a> (pic above), to detect breakage earlier.</p>
|
||
<p>(Without depending on GitHub CI)</p>
|
||
</li>
|
||
<li>
|
||
<p>Remember to show <strong>Love and Respect</strong> for NuttX Devs!</p>
|
||
<p>Previously we waited <a href="https://github.com/apache/nuttx/actions/runs/11308145630"><strong>2.5 Hours</strong></a> for All CI Checks. Now we wait at most <a href="https://github.com/apache/nuttx/actions/runs/11582139779"><strong>1.5 Hours</strong></a>, let’s stick to this.</p>
|
||
</li>
|
||
</ul>
|
||
<p><em>What about the Windows Builds?</em></p>
|
||
<p>Recently we <a href="https://github.com/apache/nuttx/issues/14598"><strong>re-enabled the Windows Builds</strong></a>, because they’re not as costly as macOS Builds.</p>
|
||
<p>We’ll continue to monitor our GitHub Costs. And shut down the Windows Builds if necessary.</p>
|
||
<p><a href="https://docs.github.com/en/billing/managing-billing-for-your-products/managing-billing-for-github-actions/about-billing-for-github-actions#minute-multipliers">(Windows Runners are <strong>twice the cost</strong> of Linux Runners)</a></p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-merge.jpg" alt="Normally our CI Workflow will trigger a Merge Job, to verify that everything compiles OK after Merging the PR" /></p>
|
||
<h1 id="move-the-merge-jobs"><a class="doc-anchor" href="#move-the-merge-jobs">§</a>4 Move the Merge Jobs</h1>
|
||
<p><em>What are Merge Jobs? Why move them?</em></p>
|
||
<p>Suppose our NuttX Admin <strong>Merges a PR</strong>. (Pic above)</p>
|
||
<p>Normally our CI Workflow will trigger a <strong>Merge Job</strong>, to verify that everything compiles OK after Merging the PR.</p>
|
||
<p>Which means ploughing through <a href="https://lupyuen.github.io/articles/ci#one-thousand-build-targets"><strong>34 Sub-Jobs</strong></a> (2.5 elapsed hours) across <strong>All Architectures</strong>: <em>Arm32, Arm64, RISC-V, Xtensa, macOS, Windows, …</em></p>
|
||
<p>This is extremely costly, hence we decided to trigger them as <strong>Scheduled Merge Jobs</strong>. I trigger them <strong>Twice Daily</strong>: 00:00 UTC and 12:00 UTC.</p>
|
||
<p><img src="https://github.com/user-attachments/assets/617cc2fe-38ac-474f-8cd8-141d19d5b1f0" alt="Screenshot 2024-10-19 at 11 33 46 AM" /></p>
|
||
<p><em>Is there a problem?</em></p>
|
||
<p>We spent <a href="https://github.com/apache/nuttx/issues/14376#issuecomment-2423563132"><strong>One-Third</strong></a> of our GitHub Runner Minutes on Scheduled Merge Jobs! (Pic above)</p>
|
||
<p><a href="https://docs.google.com/spreadsheets/d/1ujGKmUyy-cGY-l1pDBfle_Y6LKMsNp7o3rbfT1UkiZE/edit?gid=650325940#gid=650325940"><strong>Our CI Data</strong></a> shows that the Scheduled Merge Job kept getting disrupted by Newer Merged PRs. (Pic below)</p>
|
||
<p>And when we restart a Scheduled Merge Job, we waste precious GitHub Minutes.</p>
|
||
<p><a href="https://github.com/apache/nuttx/issues/14376#issuecomment-2423563132">(<strong>101 GitHub Hours</strong> for one single Scheduled Merge Job!)</a></p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-before.jpg" alt="Merge Job kept getting disrupted by Newer Merged PRs" /></p>
|
||
<p><em>Our Merge Jobs are overwhelming!</em></p>
|
||
<p>Yep this is clearly not sustainable. We moved the Scheduled Merge Jobs to a new <a href="https://github.com/NuttX/nuttx/actions/workflows/build.yml"><strong>NuttX Mirror Repo</strong></a>. (Pic below)</p>
|
||
<p>Where the Merge Jobs can run free <strong>without disruption</strong>.</p>
|
||
<p><a href="https://github.com/NuttX">(In an <strong>Unpaid GitHub Org Account</strong>, not charged to NuttX Project)</a></p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-title.jpg" alt="Optimising the Continuous Integration for Apache NuttX RTOS" /></p>
|
||
<p><em>What about the Old Merge Jobs?</em></p>
|
||
<p>Initially I ran a script that will quickly <a href="https://github.com/lupyuen/nuttx-release/blob/main/kill-push-master.sh"><strong>Cancel any Merge Jobs</strong></a> that appear in NuttX Repo and NuttX Apps.</p>
|
||
<p>Eventually we disabled the <a href="https://github.com/apache/nuttx/pull/14618"><strong>Merge Jobs for NuttX Repo</strong></a>.</p>
|
||
<p><a href="https://github.com/apache/nuttx-apps/pull/2817">(And for <strong>NuttX Apps</strong>)</a></p>
|
||
<p><a href="https://github.com/apache/nuttx/issues/14407">(Restoring <strong>Auto-Build on Sync</strong>)</a></p>
|
||
<p><em>How to trigger the Scheduled Merge Job?</em></p>
|
||
<p>Every Day at <strong>00:00 UTC</strong> and <strong>12:00 UTC</strong>: I do this…</p>
|
||
<ol>
|
||
<li>
|
||
<p>Browse to the <a href="https://github.com/NuttX/nuttx"><strong>NuttX Mirror Repo</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p>Click “<strong>Sync Fork > Discard Commits</strong>”</p>
|
||
</li>
|
||
<li>
|
||
<p>Which will <strong>Sync our Mirror Repo</strong> based on the Upstream NuttX Repo</p>
|
||
</li>
|
||
<li>
|
||
<p>Run this script to enable the <strong>macOS Builds</strong>: <a href="https://github.com/lupyuen/nuttx-release/blob/main/enable-macos-windows.sh">enable-macos-windows.sh</a></p>
|
||
<p><a href="https://github.com/lupyuen/nuttx-release/blob/main/sync-build-ingest.sh">(<strong>UPDATE:</strong> We now use <strong>sync-build-ingest.sh</strong>)</a></p>
|
||
</li>
|
||
<li>
|
||
<p>Which will also <a href="https://github.com/lupyuen/nuttx-release/blob/main/enable-macos-windows.sh#L35-L55"><strong>Disable Fail-Fast</strong></a> and grind through all builds. <a href="https://github.com/NuttX/nuttx/commit/31aea70d52d1eb6138912619f835693008596eca">(Regardless of error, pic below)</a></p>
|
||
</li>
|
||
<li>
|
||
<p>And <a href="https://github.com/lupyuen/nuttx-release/blob/main/enable-macos-windows.sh#L35-L55"><strong>Remove Max Parallel</strong></a> to use unlimited concurrent runners. <a href="https://github.com/NuttX/nuttx/commit/31aea70d52d1eb6138912619f835693008596eca">(Because it’s free! Pic below)</a></p>
|
||
</li>
|
||
<li>
|
||
<p>If the Merge Job fails with a <a href="https://lupyuen.github.io/articles/ci3#appendix-network-timeout-at-github"><strong>Mystifying Network Timeout</strong></a>: I restart the Failed Sub-Jobs. <a href="https://github.com/apache/nuttx/issues/14680">(<strong>CI Test</strong> might overrun)</a></p>
|
||
</li>
|
||
<li>
|
||
<p>Wait for the Merge Job to complete. Then <a href="https://github.com/lupyuen/ingest-nuttx-builds"><strong>Ingest the GitHub Logs</strong></a> (like an Amoeba) into our <a href="https://github.com/apache/nuttx/issues/14558"><strong>NuttX Dashboard</strong></a>. (Next article)</p>
|
||
</li>
|
||
<li>
|
||
<p>Track down any bugs that <a href="https://github.com/apache/nuttx/issues/14796"><strong>Fail the Merge Job</strong></a>.</p>
|
||
</li>
|
||
</ol>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-workflow.png" alt="Disable Fail-Fast and Remove Max Parallel" /></p>
|
||
<p><em>Is it really OK to Disable the Merge Jobs? What about Docs and Docker Builds?</em></p>
|
||
<ul>
|
||
<li>
|
||
<p><strong>Docker Builds:</strong> When <a href="https://github.com/apache/nuttx/blob/master/tools/ci/docker/linux/Dockerfile"><strong>Dockerfile</strong></a> is updated, it will trigger the CI Workflow <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/docker_linux.yml"><strong>docker_linux.yml</strong></a>. Which is not affected by this new setup, and will continue to execute. (Exactly like before)</p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Documentation:</strong> When the docs are updated, they are published to NuttX Website via the CI Workflow <a href="https://github.com/apache/nuttx-website/blob/master/.github/workflows/main.yml"><strong>main.yml</strong></a> from the NuttX Website repo (scheduled daily). Which is not affected by our grand plan.</p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Release Branch:</strong> Merging a PR to the Release Branch will still run the PR Merge Job (exactly like before). <a href="https://github.com/apache/nuttx/issues/14062#issuecomment-2406373748"><strong>Release Branch</strong></a> shall always be verified through <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/build.yml#L14-L26"><strong>Complete CI Checks</strong></a>.</p>
|
||
<p><a href="https://github.com/apache/nuttx/pull/14618">(More about this)</a></p>
|
||
</li>
|
||
</ul>
|
||
<p><em>Isn’t this cheating? Offloading to a Free GitHub Account?</em></p>
|
||
<p>Yeah that’s why we need a <a href="https://lupyuen.github.io/articles/ci3#our-wishlist"><strong>NuttX Build Farm</strong></a>. (Details below)</p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-checks.png" alt="Halve the CI Checks for a Complex PR" /></p>
|
||
<h1 id="halve-the-ci-checks"><a class="doc-anchor" href="#halve-the-ci-checks">§</a>5 Halve the CI Checks</h1>
|
||
<p><a href="https://github.com/apache/nuttx/issues/15451#issuecomment-2576576664">(<strong>Update:</strong> Right now we run <strong>100% of CI Jobs</strong> for Complex PRs)</a></p>
|
||
<p><em>One-Thirds of our GitHub Runner Minutes were spent on Merge Jobs. What about the rest?</em></p>
|
||
<p><a href="https://github.com/apache/nuttx/issues/14376#issuecomment-2423563132"><strong>Two-Thirds</strong></a> of our GitHub Runner Minutes were spent on validating <strong>New and Updated PRs</strong>.</p>
|
||
<p>Hence we’re skipping <strong>Half the CI Checks</strong> for Complex PRs.</p>
|
||
<p>(A <strong>Complex PR</strong> affects <strong>All Architectures</strong>: <em>Arm32, Arm64 RISC-V, Xtensa, etc</em>)</p>
|
||
<p><em>Which CI Checks did we select?</em></p>
|
||
<p>Today we start only these <strong>CI Checks</strong> when submitting or updating a Complex PR (pic above)</p>
|
||
<ul>
|
||
<li><em>arm-03, 05, 06, 07, 08, 10, 13</em></li>
|
||
<li><em>risc-v-01, 02, 03</em></li>
|
||
<li><em>sim-01, 02</em></li>
|
||
<li><em>xtensa-01, arm64-01, x86_64-01, other</em></li>
|
||
</ul>
|
||
<p><a href="https://github.com/apache/nuttx/pull/14602">(See the <strong>Pull Request</strong>)</a></p>
|
||
<p><a href="https://github.com/apache/nuttx-apps/pull/2813">(Synced to <strong>NuttX Apps</strong>)</a></p>
|
||
<p><em>Why did we choose these CI Checks?</em></p>
|
||
<p>We selected the CI Checks above because they validate NuttX Builds on <strong>Popular Boards</strong> (and for special tests)</p>
|
||
<div><table><thead><tr><th style="text-align: left">Target Group</th><th style="text-align: left">Board / Test</th></tr></thead><tbody>
|
||
<tr><td style="text-align: left"><em>arm-01</em></td><td style="text-align: left">Sony Spresense (TODO)</td></tr>
|
||
<tr><td style="text-align: left"><em>arm-05</em></td><td style="text-align: left">Nordic nRF52</td></tr>
|
||
<tr><td style="text-align: left"><em>arm-06</em></td><td style="text-align: left">Raspberry Pi RP2040</td></tr>
|
||
<tr><td style="text-align: left"><em>arm-07</em></td><td style="text-align: left">Microchip SAMD</td></tr>
|
||
<tr><td style="text-align: left"><em>arm-08, 10, 13</em></td><td style="text-align: left">STM32</td></tr>
|
||
<tr><td style="text-align: left"><em>risc-v-02, 03</em></td><td style="text-align: left">ESP32-C3, C6, H2</td></tr>
|
||
<tr><td style="text-align: left"><em>sim-01, 02</em></td><td style="text-align: left">CI Test, Matter</td></tr>
|
||
</tbody></table>
|
||
</div>
|
||
<p>We might <a href="https://github.com/apache/nuttx/pull/14602"><strong>rotate the list</strong></a> above to get better CI Coverage.</p>
|
||
<p><a href="https://docs.google.com/spreadsheets/d/1OdBxe30Sw3yhH0PyZtgmefelOL56fA6p26vMgHV0MRY/edit?gid=0#gid=0">(See the Complete List of <strong>CI Builds</strong>)</a></p>
|
||
<p><a href="https://github.com/apache/nuttx/pull/14681#issuecomment-2471703480">(Sorry we can’t run <strong>xtensa-02</strong> and <strong>arm-01</strong>)</a></p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-pr.jpg" alt="Complex PR vs Simple PR" /></p>
|
||
<p><em>What about Simple PRs?</em></p>
|
||
<p>A <strong>Simple PR</strong> concerns only <strong>One Single Architecture</strong>: <em>Arm32 OR Arm64 OR RISC-V OR Xtensa etc.</em></p>
|
||
<p>When we create a Simple PR for Arm32: It will trigger only the CI Checks for <em>arm-01</em> … <em>arm-14</em>.</p>
|
||
<p>Which will <a href="https://lupyuen.codeberg.page/articles/ci3.html#actual-performance"><strong>complete earlier</strong></a> than a Complex PR.</p>
|
||
<p><a href="https://lupyuen.codeberg.page/articles/ci3.html#actual-performance">(<strong>x86_64 Devs</strong> are the happiest. Their PRs complete in <strong>10 Mins</strong>!)</a></p>
|
||
<p><em>Sounds awfully complicated. How did we code the rules?</em></p>
|
||
<p>Indeed! The Build Rules are explained here…</p>
|
||
<ul>
|
||
<li><a href="https://lupyuen.github.io/articles/ci3#appendix-build-rules-for-ci-workflow">“<strong>Build Rules for CI Workflow</strong>”</a></li>
|
||
</ul>
|
||
<h1 id="live-metric-for-full-time-runners"><a class="doc-anchor" href="#live-metric-for-full-time-runners">§</a>6 Live Metric for Full-Time Runners</h1>
|
||
<p><em>Hitting the Target Metrics in 2 weeks… Everyone needs to help out right?</em></p>
|
||
<p>Our quota is <a href="https://infra.apache.org/github-actions-policy.html"><strong>25 Full-Time GitHub Runners</strong></a> per day.</p>
|
||
<p>We published our own <strong>Live Metric for Full-Time Runners</strong>, for everyone to track…</p>
|
||
<p><img src="https://lupyuen.github.io/nuttx-metrics/github-fulltime-runners.png" alt="Live Metric for Full-Time Runners" /></p>
|
||
<ul>
|
||
<li>
|
||
<p><strong>Date:</strong> We compute the Full-Time Runners only for Today’s Date (UTC)</p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Elapsed Hours:</strong> Number of hours elapsed since 00:00 UTC</p>
|
||
</li>
|
||
<li>
|
||
<p><strong>GitHub Job Hours:</strong> Elapsed Duration of all GitHub Jobs at NuttX Repo and NuttX Apps. <em>(Cancelled / Completed / Failed)</em></p>
|
||
<p>This data is available only AFTER the job has been Cancelled / Completed / Failed. (Might have lagged by 1.5 hours)</p>
|
||
<p>But this is the <em>Elapsed Job Duration</em>. It doesn’t say that we’re running 8 Sub-Jobs in parallel. That’s why we need…</p>
|
||
</li>
|
||
<li>
|
||
<p><strong>GitHub Runner Hours:</strong> Number of GitHub Runners * Job Duration. Effectively the <em>Chargeable Minutes</em> by GitHub.</p>
|
||
<p>We compute this as 8 * GitHub Job Hours. This is <a href="https://docs.google.com/spreadsheets/d/1ujGKmUyy-cGY-l1pDBfle_Y6LKMsNp7o3rbfT1UkiZE/edit?gid=1163309346#gid=1163309346"><strong>averaged from past data</strong></a>.</p>
|
||
<p>(Remember: One GitHub Runner will run One Single Sub-Job, like <em>arm-01</em>)</p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Full-Time GitHub Runners:</strong> Equals GitHub Runner Hours / Elapsed Hours.</p>
|
||
<p>It means <em>“How many GitHub Runners, running Full-Time, in order to consume the GitHub Runner Hours”</em>.</p>
|
||
<p>(We should keep this below 25 per day, per week, per month)</p>
|
||
</li>
|
||
</ul>
|
||
<p>We publish the data every <strong>15 minutes</strong>…</p>
|
||
<ol>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/nuttx-release/blob/main/compute-github-runners.sh"><strong>compute-github-runners.sh</strong></a> calls GitHub API to add up the <strong>Elapsed Duration</strong> of All Completed GitHub Jobs for today.</p>
|
||
<p>Then it extrapolates the Number of <strong>Full-Time GitHub Runners</strong>.</p>
|
||
<p>(1 GitHub Job Hour roughly equals 8 GitHub Runner Hours, which equals 8 Full-Time Runners Per Hour)</p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/nuttx-metrics/blob/main/run.sh"><strong>run.sh</strong></a> calls the script above and render the Full-Time GitHub Runners as a PNG.</p>
|
||
<p>(Thanks to ImageMagick)</p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/nuttx-release/blob/main/compute-github-runners2.sh"><strong>compute-github-runners2.sh</strong></a>: Is the Linux Version of the above macOS Script.</p>
|
||
<p>(But less accurate, due to BC Rounding)</p>
|
||
</li>
|
||
</ol>
|
||
<p>Next comes the Watchmen…</p>
|
||
<p><a href="https://lupyuen.github.io/articles/ci3#appendix-self-hosted-github-runners">(Can we run <strong>All CI Checks</strong> for All PRs?)</a></p>
|
||
<p><img src="https://github.com/user-attachments/assets/e25badb4-112b-4392-8605-7427aee47b89" alt="PXL_20241020_114213194" /></p>
|
||
<h1 id="monitor-our-ci-servers-24-x-7"><a class="doc-anchor" href="#monitor-our-ci-servers-24-x-7">§</a>7 Monitor our CI Servers 24 x 7</h1>
|
||
<p><em>Doesn’t sound right that an Unpaid Volunteer is monitoring our CI Servers 24 x 7 … But someone’s gotta do it!</em> 👍</p>
|
||
<p>This runs on a 4K TV (Xiaomi 65-inch) all day, all night…</p>
|
||
<p><img src="https://github.com/user-attachments/assets/3f862ed6-8890-4d00-99e1-f5b8352ddcd1" alt="Screenshot 2024-10-28 at 1 53 26 PM" /></p>
|
||
<p><a href="https://www.strava.com/activities/12737067287"><strong>On Overnight Hikes</strong></a>: I check my phone at every water break…</p>
|
||
<p><img src="https://github.com/user-attachments/assets/88232734-aecc-4af8-bc0e-641db1cfdf9e" alt="GridArt_20241028_150938083" /></p>
|
||
<p><em>If something goes wrong?</em></p>
|
||
<p>We have GitHub Scripts for <strong>Termux Android</strong>. Remember to <em>“pkg install gh”</em> and set <em>GITHUB_TOKEN</em>…</p>
|
||
<ul>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/nuttx-release/blob/main/enable-macos-windows2.sh"><strong>enable-macos-windows2.sh</strong></a>: Enable the macOS Builds in the NuttX Mirror Repo</p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/nuttx-release/blob/main/compute-github-runners2.sh"><strong>compute-github-runners2.sh</strong></a>: Compute the number of Full-Time GitHub Runners for the day (less accurately than macOS version)</p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/nuttx-release/blob/main/kill-push-master.sh"><strong>kill-push-master.sh</strong></a>: Cancel all Merge Jobs in NuttX Repo and NuttX Apps</p>
|
||
</li>
|
||
</ul>
|
||
<h1 id="final-verdict"><a class="doc-anchor" href="#final-verdict">§</a>8 Final Verdict</h1>
|
||
<p>It’s past Diwali and Halloween and Elections… Our CI Servers are still alive. <strong>We made it yay!</strong> 🎉</p>
|
||
<p><strong>Within Two Weeks:</strong> We squashed our GitHub Actions spending from <strong>$4,900</strong> (weekly) down to <strong>$890</strong>…</p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-beforeafter7days.jpg" alt="Within Two Weeks: We squashed our GitHub Actions spending from $4,900 (weekly) down to $890" /></p>
|
||
<p><strong>“Monthly Bill”</strong> for GitHub Actions used to be <strong>$18K</strong>…</p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-before30days.png" alt="Monthly Bill for GitHub Actions used to be $18K" /></p>
|
||
<p>Presently our <strong>Monthly Bill is $9.8K</strong>. Slashed by half (almost) and still dropping! Thank you everyone for making this happen! 🙏</p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-after30days.png" alt="Right now our Monthly Bill is $9.8K" /></p>
|
||
<p>(<strong>At Mid Nov 2024:</strong> Monthly Bill is now <strong>$3.1K</strong> 🎉)</p>
|
||
<p><strong>Bonus Love & Respect:</strong> Previously our devs waited <strong>2.5 Hours</strong> for a Pull Request to be checked. Now we wait at most <strong>1.5 Hours</strong>!</p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-sync.jpg" alt="Tired Fingers syncing the NuttX Repo to NuttX Mirror Repo" /></p>
|
||
<h1 id="our-wishlist"><a class="doc-anchor" href="#our-wishlist">§</a>9 Our Wishlist</h1>
|
||
<p><em>Everything is hunky dory?</em></p>
|
||
<p>Trusting a <strong>Single Provider for Continuous Integration</strong> is a terrible thing. We got plenty more to do…</p>
|
||
<ul>
|
||
<li>
|
||
<p>Become more resilient and self-sufficient with <a href="https://lupyuen.codeberg.page/articles/ci2.html"><strong>Our Own Build Farm</strong></a></p>
|
||
<p>(Away from GitHub)</p>
|
||
</li>
|
||
<li>
|
||
<p>Analyse our Build Logs with <a href="https://github.com/apache/nuttx/issues/14558"><strong>Our Own Tools</strong></a></p>
|
||
<p>(Instead of GitHub)</p>
|
||
</li>
|
||
<li>
|
||
<p>Excellent Initiative by <a href="https://github.com/raiden00pl"><strong>Mateusz Szafoni</strong></a>: We <a href="https://github.com/apache/nuttx/pull/14410"><strong>Merge Multiple Targets</strong></a> into One Target</p>
|
||
<p>(And cut the Build Time)</p>
|
||
</li>
|
||
</ul>
|
||
<p><a href="https://github.com/apache/nuttx/issues/14558">🙏🙏🙏 Please join <strong>Your Ubuntu PC</strong> to our Build Farm! 🙏🙏🙏</a></p>
|
||
<p><em>But our Merge Jobs are still running in a Free Account?</em></p>
|
||
<p>We learnt a Painful Lesson today: <strong>Freebies Won’t Last Forever!</strong></p>
|
||
<p>We should probably maintain an official <strong>Paid GitHub Org Account</strong> to execute our Merge Jobs…</p>
|
||
<ol>
|
||
<li>
|
||
<p>New GitHub Org shall be sponsored by our generous <strong>Stakeholder Companies</strong></p>
|
||
<p>(Espressif, Sony, Xiaomi, …)</p>
|
||
</li>
|
||
<li>
|
||
<p>New GitHub Org shall be maintained by a <strong>Paid Employee</strong> of our Stakeholder Companies</p>
|
||
<p>(Instead of an Unpaid Volunteer)</p>
|
||
</li>
|
||
<li>
|
||
<p>Which means clicking Twice Per Day to trigger the <a href="https://lupyuen.codeberg.page/articles/ci3.html#move-the-merge-jobs"><strong>Scheduled Merge Jobs</strong></a></p>
|
||
<p>(My fingers are tired, pic above)</p>
|
||
<p><a href="https://github.com/lupyuen/nuttx-release/blob/main/sync-build-ingest.sh">(<strong>UPDATE:</strong> We now use <strong>sync-build-ingest.sh</strong>)</a></p>
|
||
</li>
|
||
<li>
|
||
<p>And restarting the <strong>Failed Merge Jobs</strong></p>
|
||
<p><a href="https://lupyuen.github.io/articles/ci3#appendix-network-timeout-at-github">(Because of <strong>Mysterious Network Timeouts</strong>)</a></p>
|
||
<p><a href="https://github.com/apache/nuttx/issues/14680">(<strong>CI Test</strong> might overrun)</a></p>
|
||
</li>
|
||
<li>
|
||
<p>Track down any bugs that <a href="https://github.com/apache/nuttx/issues/14796"><strong>Fail the Merge Job</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p>New GitHub Org shall host the Official Downloads of <strong>NuttX Compiled Binaries</strong></p>
|
||
<p>(For upcoming <strong>Board Testing Farm</strong>)</p>
|
||
</li>
|
||
<li>
|
||
<p>New GitHub Org will eventually <strong>Offload CI Checks</strong> from our NuttX Repos</p>
|
||
<p>(Maybe do macOS CI Checks for PRs)</p>
|
||
</li>
|
||
</ol>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-title.jpg" alt="Optimising the Continuous Integration for Apache NuttX RTOS" /></p>
|
||
<h1 id="whats-next"><a class="doc-anchor" href="#whats-next">§</a>10 What’s Next</h1>
|
||
<p>Next Article: We’ll chat about <strong>NuttX Dashboard</strong>. And how we made it with Grafana and Prometheus…</p>
|
||
<ul>
|
||
<li>
|
||
<p><a href="https://lupyuen.github.io/articles/ci4"><strong>“Continuous Integration Dashboard for Apache NuttX RTOS (Prometheus and Grafana)”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://lupyuen.github.io/articles/ci5"><strong>“macOS Build Farm for Apache NuttX RTOS (Apple Silicon)”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://lupyuen.github.io/articles/ci6"><strong>“Rewinding a Build for Apache NuttX RTOS (Docker)”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://lupyuen.github.io/articles/ci7"><strong>“Failing a Continuous Integration Test for Apache NuttX RTOS (QEMU RISC-V)”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://lupyuen.github.io/articles/mastodon"><strong>“(Experimental) Mastodon Server for Apache NuttX Continuous Integration (macOS Rancher Desktop)”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://lupyuen.org/articles/bisect.html"><strong>“Git Bisecting a Bug (Apache NuttX RTOS)”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://lupyuen.org/articles/forgejo.html"><strong>“Forgejo Git Forge for Apache NuttX RTOS (Experimental)”</strong></a></p>
|
||
</li>
|
||
</ul>
|
||
<p>Many Thanks to the awesome <strong>NuttX Admins</strong> and <strong>NuttX Devs</strong>! I couldn’t have survived the two choatic and stressful weeks without your help. And my <a href="https://lupyuen.github.io/articles/sponsor"><strong>GitHub Sponsors</strong></a>, for sticking with me all these years.</p>
|
||
<ul>
|
||
<li>
|
||
<p><a href="https://lupyuen.github.io/articles/sponsor"><strong>Sponsor me a coffee</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://news.ycombinator.com/item?id=42097212"><strong>Discuss this article on Hacker News</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/nuttx-sg2000"><strong>My Current Project: “Apache NuttX RTOS for Sophgo SG2000”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/nuttx-ox64"><strong>My Other Project: “NuttX for Ox64 BL808”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/nuttx-star64"><strong>Older Project: “NuttX for Star64 JH7110”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/lupyuen/pinephone-nuttx"><strong>Olderer Project: “NuttX for PinePhone”</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://lupyuen.github.io"><strong>Check out my articles</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://lupyuen.github.io/rss.xml"><strong>RSS Feed</strong></a></p>
|
||
</li>
|
||
</ul>
|
||
<p><em>Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…</em></p>
|
||
<p><a href="https://github.com/lupyuen/lupyuen.github.io/blob/master/src/ci3.md"><strong>lupyuen.github.io/src/ci3.md</strong></a></p>
|
||
<h1 id="appendix-self-hosted-github-runners"><a class="doc-anchor" href="#appendix-self-hosted-github-runners">§</a>11 Appendix: Self-Hosted GitHub Runners</h1>
|
||
<p><em>To run the Complete Suite of CI Checks on every PR… We could use Self-Hosted GitHub Runners?</em></p>
|
||
<p>Yep I tested Self-Hosted GitHub Runners, I wrote about my experience here: <a href="https://lupyuen.github.io/articles/ci"><strong>“Continuous Integration for Apache NuttX RTOS”</strong></a></p>
|
||
<ul>
|
||
<li>
|
||
<p><strong>Self-Hosted GitHub Runners</strong> are actually quite complex to setup. And the machine needs to be <a href="https://cwiki.apache.org/confluence/display/INFRA/GitHub+self-hosted+runners#:~:text=However%20this%20is%20not%20something%20to%20tackle%20lightly%2C%20as%20Infra%20will%20not%20manage%20or%20secure%20your%20VM%C2%A0%2D%20that%20is%20up%20to%20you."><strong>properly secured</strong></a>, in case any unauthorised code is pushed down from GitHub.</p>
|
||
</li>
|
||
<li>
|
||
<p>We don’t have budget to set up <a href="https://cwiki.apache.org/confluence/display/INFRA/GitHub+self-hosted+runners#:~:text=However%20this%20is%20not%20something%20to%20tackle%20lightly%2C%20as%20Infra%20will%20not%20manage%20or%20secure%20your%20VM%C2%A0%2D%20that%20is%20up%20to%20you."><strong>Virtual Machines maintained by IT Security Professionals</strong></a> for GitHub Runners anyway</p>
|
||
</li>
|
||
<li>
|
||
<p>NuttX Project might be a little <strong>too dependent on GitHub</strong>. Even if we had the funds, the ASF contract with GitHub won’t allow us to pay more for extra usage. So we’re trying alternatives.</p>
|
||
</li>
|
||
<li>
|
||
<p>Right now we’re testing a <strong>Community-Hosted Build Farm</strong> based on Ubuntu PCs and macOS: <a href="https://lupyuen.github.io/articles/ci2"><strong>“Your very own Build Farm for Apache NuttX RTOS”</strong></a></p>
|
||
</li>
|
||
</ul>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-checks.png" alt="CI Checks for a Complex PR" /></p>
|
||
<h1 id="appendix-check-our-pr-submission"><a class="doc-anchor" href="#appendix-check-our-pr-submission">§</a>12 Appendix: Check our PR Submission</h1>
|
||
<p><em>Before submitting a PR to NuttX: How to check our PR thoroughly?</em></p>
|
||
<p>Yep it’s super important to <strong>thoroughly test our PRs</strong> before submitting to NuttX.</p>
|
||
<p>But NuttX Project <a href="https://lupyuen.codeberg.page/articles/ci3.html#disable-macos-and-windows-builds"><strong>doesn’t have the budget</strong></a> to run all CI Checks for New PRs. The onus is on us to test our PRs (without depending on the CI Workflow)</p>
|
||
<ol>
|
||
<li>
|
||
<p>Run the CI Builds ourselves with <strong>Docker Engine</strong></p>
|
||
</li>
|
||
<li>
|
||
<p>Or run the CI Builds with <strong>GitHub Actions</strong></p>
|
||
</li>
|
||
</ol>
|
||
<p>(1) might be slower, depending on our PC. With (2) we don’t need to worry about Wasting GitHub Runners, so long as the CI Workflow runs entirely in our own personal repo, before submitting to NuttX Repo.</p>
|
||
<p>Here are the instructions…</p>
|
||
<ul>
|
||
<li>
|
||
<p><a href="https://github.com/apache/nuttx/issues/14601#issuecomment-2452875114"><strong>CI Check: Docker vs GitHub Actions</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/apache/nuttx/pull/14590#issuecomment-2459178845"><strong>CI Check: Enable for PR Branch</strong></a></p>
|
||
</li>
|
||
</ul>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-dashboard.png" alt="NuttX Dashboard" /></p>
|
||
<p><em>What if our PR fails the check, caused by Another PR?</em></p>
|
||
<p>We wait for the <strong>Other PR to be patched</strong>…</p>
|
||
<ol>
|
||
<li>
|
||
<p>Set our PR to <strong>Draft Mode</strong></p>
|
||
</li>
|
||
<li>
|
||
<p>Keep checking the <strong>NuttX Dashboard</strong> (above)</p>
|
||
</li>
|
||
<li>
|
||
<p>Wait patiently for the <strong>Red Error Boxes</strong> to disappear</p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://lupyuen.github.io/articles/pr#submit-the-pull-request"><strong>Rebase our PR</strong></a> with the Master Branch</p>
|
||
</li>
|
||
<li>
|
||
<p>Our PR should pass the CI Check. Set our PR to <strong>Ready for Review</strong>.</p>
|
||
</li>
|
||
</ol>
|
||
<p>Otherwise we might miss a <a href="https://github.com/apache/nuttx/actions/runs/11700129839"><strong>Serious Bug</strong></a>.</p>
|
||
<p><img src="https://github.com/user-attachments/assets/ca08db63-ecca-4b18-984e-46ba3a9716c2" alt="Screenshot 2024-10-19 at 8 11 22 AM" /></p>
|
||
<h1 id="appendix-verify-our-pr-merge"><a class="doc-anchor" href="#appendix-verify-our-pr-merge">§</a>13 Appendix: Verify our PR Merge</h1>
|
||
<p><em>When NuttX merges our PR, the Merge Job won’t run until 00:00 UTC and 12:00 UTC. How can we be really sure that our PR was merged correctly?</em></p>
|
||
<p>Let’s create a <strong>GitHub Org</strong> (at no cost), fork the NuttX Repo and trigger the <strong>CI Workflow</strong>. (Which won’t charge any extra GitHub Runner Minutes to NuttX Project!)</p>
|
||
<ul>
|
||
<li><a href="https://github.com/apache/nuttx/issues/14407"><strong>“How to Verify a PR Merge”</strong></a></li>
|
||
</ul>
|
||
<p>This will probably work if our CI Servers ever go dark.</p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-timeout.png" alt="Network Timeout at GitHub" /></p>
|
||
<h1 id="appendix-network-timeout-at-github"><a class="doc-anchor" href="#appendix-network-timeout-at-github">§</a>14 Appendix: Network Timeout at GitHub</h1>
|
||
<p><a href="https://github.com/apache/nuttx/issues/14682">(See the <strong>NuttX Issue</strong>)</a></p>
|
||
<p>Something super strange about <strong>Network Timeouts</strong> (pic above) in our CI Docker Workflows at GitHub Actions. Here’s an example…</p>
|
||
<ul>
|
||
<li>
|
||
<p>First Run fails while <a href="https://github.com/nuttxpr/nuttx/actions/runs/11535899222/job/32111488205#step:7:626"><strong>downloading something from GitHub</strong></a>…</p>
|
||
<div class="example-wrap"><pre class="language-text"><code>Configuration/Tool: imxrt1050-evk/libcxxtest,CONFIG_ARM_TOOLCHAIN_GNU_EABI
|
||
curl: (28) Failed to connect to github.com port 443 after 134188 ms: Connection timed out
|
||
make[1]: *** [libcxx.defs:28: libcxx-17.0.6.src.tar.xz] Error 28</code></pre></div></li>
|
||
<li>
|
||
<p>Second Run fails again, while <a href="https://github.com/nuttxpr/nuttx/actions/runs/11535899222/job/32112716849#step:7:536"><strong>downloading NimBLE from GitHub</strong></a>…</p>
|
||
<div class="example-wrap"><pre class="language-text"><code>Configuration/Tool: nucleo-wb55rg/nimble,CONFIG_ARM_TOOLCHAIN_GNU_EABI
|
||
curl: (28) Failed to connect to github.com port [443](https://github.com/nuttxpr/nuttx/actions/runs/11535899222/job/32112716849#step:7:444) after 134619 ms: Connection timed out
|
||
make[2]: *** [Makefile:55: /github/workspace/sources/apps/wireless/bluetooth/nimble_context] Error 2</code></pre></div></li>
|
||
<li>
|
||
<p><a href="https://github.com/nuttxpr/nuttx/actions/runs/11535899222"><strong>Third Run succeeds.</strong></a> Why do we keep seeing these errors: GitHub Actions with Docker, can’t connect to GitHub itself?</p>
|
||
</li>
|
||
<li>
|
||
<p>Is there a <strong>Concurrent Connection Limit</strong> for GitHub HTTPS Connections?</p>
|
||
<p>We see <strong>4 Concurrent Connections</strong> to GitHub HTTPS…</p>
|
||
<ul>
|
||
<li>
|
||
<p><a href="https://github.com/nuttxpr/nuttx/actions/runs/11535899222/job/32111489166#step:7:84"><strong>risc-v-05</strong> at 00:41:06</a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/nuttxpr/nuttx/actions/runs/11535899222/job/32111488582#step:7:510"><strong>xtensa-02</strong> at 00:41:17</a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/nuttxpr/nuttx/actions/runs/11535899222/job/32111487874#step:7:586"><strong>xtensa-01</strong> at 00:41:34</a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://github.com/nuttxpr/nuttx/actions/runs/11535899222/job/32111488301#step:7:532"><strong>risc-v-02</strong> at 00:41:58</a></p>
|
||
</li>
|
||
</ul>
|
||
<p>The <strong>Fifth Connection</strong> failed: <a href="https://github.com/nuttxpr/nuttx/actions/runs/11535899222/job/32111488205#step:7:619"><strong>arm-02</strong> at 00:42:52</a></p>
|
||
</li>
|
||
<li>
|
||
<p>Should we use a <a href="https://ubuntu.com/server/docs/how-to-install-a-squid-server"><strong>Caching Proxy Server</strong></a> for curl?</p>
|
||
<div class="example-wrap"><pre class="language-bash"><code>$ export https_proxy=https://1.2.3.4:1234
|
||
$ curl https://github.com/...</code></pre></div></li>
|
||
<li>
|
||
<p>Is something misconfigured in our <strong>Docker Image</strong>?</p>
|
||
<p>But the exact same Docker Image runs fine on <a href="https://lupyuen.github.io/articles/ci2"><strong>our own Build Farm</strong></a>. It <a href="https://lupyuen.codeberg.page/articles/ci2.html"><strong>doesn’t show any errors</strong></a>.</p>
|
||
</li>
|
||
<li>
|
||
<p>Is GitHub Actions starting our Docker Container with the wrong MTU (Network Packet Size)? 🤔</p>
|
||
<ul>
|
||
<li>
|
||
<p><a href="https://github.com/actions/actions-runner-controller/issues/393"><strong>GitHub Actions with Smaller MTU Size</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://mlohr.com/docker-mtu/"><strong>Docker MTU issues and solutions</strong></a></p>
|
||
</li>
|
||
</ul>
|
||
</li>
|
||
<li>
|
||
<p>Meanwhile I’m running a script to Restart Failed Jobs on our NuttX Mirror Repo: <a href="https://github.com/lupyuen/nuttx-release/blob/main/restart-failed-job.sh">restart-failed-job.sh</a></p>
|
||
</li>
|
||
</ul>
|
||
<p>These <strong>Timeout Errors</strong> will cost us precious GitHub Minutes. The remaining jobs get killed, and restarting these killed jobs from scratch will consume extra GitHub Minutes. (The restart below costs us 6 extra GitHub Runner Hours)</p>
|
||
<ol>
|
||
<li>
|
||
<p>How do we <strong>Retry these Timeout Errors</strong>?</p>
|
||
</li>
|
||
<li>
|
||
<p>Can we have <strong>Restartable Builds</strong>?</p>
|
||
<p>Doesn’t quite make sense to kill everything and rebuild from scratch <em>(arm6, arm7, riscv7)</em> just because one job failed <em>(xtensa2)</em></p>
|
||
</li>
|
||
<li>
|
||
<p>Or <em>xtensa2</em> should <strong>wait for others</strong> to finish, before it declares a timeout and croaks?</p>
|
||
</li>
|
||
</ol>
|
||
<div class="example-wrap"><pre class="language-text"><code>Configuration/Tool: esp32s2-kaluga-1/lvgl_st7789
|
||
curl: Failed to connect to github.com port 443 after 133994 ms:
|
||
Connection timed out</code></pre></div>
|
||
<p><a href="https://github.com/apache/nuttx/actions/runs/11395811301/job/31708665147#step:7:348">(See the <strong>Complete Log</strong>)</a></p>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-beforeafter.jpg" alt="Previously: Our developers waited 2.5 Hours for a Pull Request to be checked. Now we wait at most 1.5 Hours" /></p>
|
||
<h1 id="appendix-build-rules-for-ci-workflow"><a class="doc-anchor" href="#appendix-build-rules-for-ci-workflow">§</a>15 Appendix: Build Rules for CI Workflow</h1>
|
||
<p>Initially we created the <strong>Build Rules</strong> for CI Workflow to solve these problems that we observed in Sep 2024…</p>
|
||
<ul>
|
||
<li>
|
||
<p>NuttX Devs need to wait (2.5 hours) for the CI Build to complete <strong>Across all Architectures</strong> <em>(Arm32, Arm64, RISC-V, Xtensa)</em>…</p>
|
||
<p>Even though we’re modifying a <strong>Single Architecture</strong>.</p>
|
||
</li>
|
||
<li>
|
||
<p>We’re using <strong>too many GitHub Runners</strong> and Build Minutes, exceeding the <a href="https://infra.apache.org/github-actions-policy.html"><strong>ASF Policy for GitHub Actions</strong></a></p>
|
||
</li>
|
||
<li>
|
||
<p>Our usage of GitHub Runners is going up ($12K per month)</p>
|
||
<p>We need to stay within the <a href="https://infra.apache.org/github-actions-policy.html"><strong>ASF Budget for GitHub Runners</strong></a> ($8.2K per month)</p>
|
||
</li>
|
||
<li>
|
||
<p>What if CI could build only the <strong>Modified Architecture</strong>?</p>
|
||
</li>
|
||
<li>
|
||
<p>Right now most of our CI Builds are taking 2.5 mins.</p>
|
||
<p>Can we <strong>complete the build within 1 hour</strong>, when we Create / Modify a Simple PR?</p>
|
||
</li>
|
||
</ul>
|
||
<p>This section explains how we coded the Build Rules. Which were mighty helpful for cutting costs in Nov 2024.</p>
|
||
<p><a href="https://github.com/apache/nuttx/issues/13775">(Discussion here)</a></p>
|
||
<h2 id="overall-solution"><a class="doc-anchor" href="#overall-solution">§</a>15.1 Overall Solution</h2>
|
||
<p>We propose a Partial Solution, based on the <a href="https://github.com/apache/nuttx/pull/13545"><strong>Arch and Board Labels</strong></a> (recently added to CI)…</p>
|
||
<ul>
|
||
<li>
|
||
<p>We target only the <strong>Simple PRs</strong>: One Arch Label + One Board Label + One Size Label.</p>
|
||
<p>Like <em>“Arch: risc-v, Board: risc-v, Size: XS”</em></p>
|
||
</li>
|
||
<li>
|
||
<p>If <em>“Arch: arm”</em> is the only non-size label, then we build only <em>arm-01, arm-02, …</em></p>
|
||
</li>
|
||
<li>
|
||
<p>Same for <em>“Board: arm”</em></p>
|
||
</li>
|
||
<li>
|
||
<p>If <strong>Arch and Board Labels</strong> are both present: They must be the same</p>
|
||
</li>
|
||
<li>
|
||
<p>Similar rules for RISC-V, Simulator, x86_64 and Xtensa</p>
|
||
</li>
|
||
<li>
|
||
<p><strong>Simple PR + Docs</strong> is still considered a Simple PR (so devs won’t be penalised for adding docs)</p>
|
||
</li>
|
||
</ul>
|
||
<h2 id="fetch-the-arch-labels"><a class="doc-anchor" href="#fetch-the-arch-labels">§</a>15.2 Fetch the Arch Labels</h2>
|
||
<p><strong>In our Build Rules:</strong> This is how we fetch the Arch Labels from a PR. And identify the PR as Arm, Arm64, RISC-V or Xtensa: <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/arch.yml#L32-L104">arch.yml</a></p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code># Get the Arch for the PR: arm, arm64, risc-v, xtensa, ...
|
||
- name: Get arch
|
||
id: get-arch
|
||
run: |
|
||
|
||
# If PR is Not Created or Modified: Build all targets
|
||
pr=${{github.event.pull_request.number}}
|
||
if [[ "$pr" == "" ]]; then
|
||
echo "Not a Created or Modified PR, will build all targets"
|
||
exit
|
||
fi
|
||
|
||
# Ignore the Label "Area: Documentation", because it won't affect the Build Targets
|
||
query='.labels | map(select(.name != "Area: Documentation")) | '
|
||
select_name='.[].name'
|
||
select_length='length'
|
||
|
||
# Get the Labels for the PR: "Arch: risc-v \n Board: risc-v \n Size: XS"
|
||
# If GitHub CLI Fails: Build all targets
|
||
labels=$(gh pr view $pr --repo $GITHUB_REPOSITORY --json labels --jq $query$select_name || echo "")
|
||
numlabels=$(gh pr view $pr --repo $GITHUB_REPOSITORY --json labels --jq $query$select_length || echo "")
|
||
echo "numlabels=$numlabels" | tee -a $GITHUB_OUTPUT
|
||
|
||
# Identify the Size, Arch and Board Labels
|
||
if [[ "$labels" == *"Size: "* ]]; then
|
||
echo 'labels_contain_size=1' | tee -a $GITHUB_OUTPUT
|
||
fi
|
||
if [[ "$labels" == *"Arch: "* ]]; then
|
||
echo 'labels_contain_arch=1' | tee -a $GITHUB_OUTPUT
|
||
fi
|
||
if [[ "$labels" == *"Board: "* ]]; then
|
||
echo 'labels_contain_board=1' | tee -a $GITHUB_OUTPUT
|
||
fi
|
||
|
||
# Get the Arch Label
|
||
if [[ "$labels" == *"Arch: arm64"* ]]; then
|
||
echo 'arch_contains_arm64=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Arch: arm"* ]]; then
|
||
echo 'arch_contains_arm=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Arch: risc-v"* ]]; then
|
||
echo 'arch_contains_riscv=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Arch: simulator"* ]]; then
|
||
echo 'arch_contains_sim=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Arch: x86_64"* ]]; then
|
||
echo 'arch_contains_x86_64=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Arch: xtensa"* ]]; then
|
||
echo 'arch_contains_xtensa=1' | tee -a $GITHUB_OUTPUT
|
||
fi
|
||
|
||
# Get the Board Label
|
||
if [[ "$labels" == *"Board: arm64"* ]]; then
|
||
echo 'board_contains_arm64=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Board: arm"* ]]; then
|
||
echo 'board_contains_arm=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Board: risc-v"* ]]; then
|
||
echo 'board_contains_riscv=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Board: simulator"* ]]; then
|
||
echo 'board_contains_sim=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Board: x86_64"* ]]; then
|
||
echo 'board_contains_x86_64=1' | tee -a $GITHUB_OUTPUT
|
||
elif [[ "$labels" == *"Board: xtensa"* ]]; then
|
||
echo 'board_contains_xtensa=1' | tee -a $GITHUB_OUTPUT
|
||
fi
|
||
|
||
env:
|
||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}</code></pre></div>
|
||
<p>Why “<code> || echo ""</code>”? That’s because if the <strong>GitHub CLI gh</strong> fails for any reason, we shall build all targets.</p>
|
||
<p>This ensures that our CI Workflow won’t get disrupted due to errors in GitHub CLI.</p>
|
||
<h2 id="limit-to-simple-prs"><a class="doc-anchor" href="#limit-to-simple-prs">§</a>15.3 Limit to Simple PRs</h2>
|
||
<p>We handle only <strong>Simple PRs</strong>: One Arch Label + One Board Label + One Size Label.</p>
|
||
<p>Like <em>“Arch: risc-v, Board: risc-v, Size: XS”</em>.</p>
|
||
<p>If it’s <strong>Not a Simple PR</strong>: We build everything. Like so: <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/arch.yml#L130-L189">arch.yml</a></p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code># inputs.boards is a JSON Array: ["arm-01", "risc-v-01", "xtensa-01", ...]
|
||
# We compact and remove the newlines
|
||
boards=$( echo '${{ inputs.boards }}' | jq --compact-output ".")
|
||
numboards=$( echo "$boards" | jq "length" )
|
||
|
||
# We consider only Simple PRs with:
|
||
# Arch + Size Labels Only
|
||
# Board + Size Labels Only
|
||
# Arch + Board + Size Labels Only
|
||
if [[ "$labels_contain_size" != "1" ]]; then
|
||
echo "Size Label Missing, will build all targets"
|
||
quit=1
|
||
elif [[ "$numlabels" == "2" && "$labels_contain_arch" == "1" ]]; then
|
||
echo "Arch + Size Labels Only"
|
||
elif [[ "$numlabels" == "2" && "$labels_contain_board" == "1" ]]; then
|
||
echo "Board + Size Labels Only"
|
||
elif [[ "$numlabels" == "3" && "$labels_contain_arch" == "1" && "$labels_contain_board" == "1" ]]; then
|
||
# Arch and Board must be the same
|
||
if [[
|
||
"$arch_contains_arm" != "$board_contains_arm" ||
|
||
"$arch_contains_arm64" != "$board_contains_arm64" ||
|
||
"$arch_contains_riscv" != "$board_contains_riscv" ||
|
||
"$arch_contains_sim" != "$board_contains_sim" ||
|
||
"$arch_contains_x86_64" != "$board_contains_x86_64" ||
|
||
"$arch_contains_xtensa" != "$board_contains_xtensa"
|
||
]]; then
|
||
echo "Arch and Board are not the same, will build all targets"
|
||
quit=1
|
||
else
|
||
echo "Arch + Board + Size Labels Only"
|
||
fi
|
||
else
|
||
echo "Not a Simple PR, will build all targets"
|
||
quit=1
|
||
fi
|
||
|
||
# If Not a Simple PR: Build all targets
|
||
if [[ "$quit" == "1" ]]; then
|
||
# If PR was Created or Modified: Exclude some boards
|
||
pr=${{github.event.pull_request.number}}
|
||
if [[ "$pr" != "" ]]; then
|
||
echo "Excluding arm-0[1249], arm-1[124-9], risc-v-04..06, sim-03, xtensa-02"
|
||
boards=$(
|
||
echo '${{ inputs.boards }}' |
|
||
jq --compact-output \
|
||
'map(
|
||
select(
|
||
test("arm-0[1249]") == false and test("arm-1[124-9]") == false and
|
||
test("risc-v-0[4-9]") == false and
|
||
test("sim-0[3-9]") == false and
|
||
test("xtensa-0[2-9]") == false
|
||
)
|
||
)'
|
||
)
|
||
fi
|
||
echo "selected_builds=$boards" | tee -a $GITHUB_OUTPUT
|
||
exit
|
||
fi</code></pre></div><h2 id="identify-the-non-arm-builds"><a class="doc-anchor" href="#identify-the-non-arm-builds">§</a>15.4 Identify the Non-Arm Builds</h2>
|
||
<p>Suppose the PR says <em>“Arch: arm”</em> or <em>“Board: arm”</em>.</p>
|
||
<p>We filter out the builds that should be skipped (RISC-V, Xtensa, etc): <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/arch.yml#L189-L254">arch.yml</a></p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code># For every board
|
||
for (( i=0; i<numboards; i++ ))
|
||
do
|
||
# Fetch the board
|
||
board=$( echo "$boards" | jq ".[$i]" )
|
||
skip_build=0
|
||
|
||
# For "Arch / Board: arm": Build arm-01, arm-02, ...
|
||
if [[ "$arch_contains_arm" == "1" || "$board_contains_arm" == "1" ]]; then
|
||
if [[ "$board" != *"arm"* ]]; then
|
||
skip_build=1
|
||
fi
|
||
# Omitted: Arm64, RISC-V, Simulator x86_64, Xtensa
|
||
...
|
||
# For Other Arch: Allow the build
|
||
else
|
||
echo Build by default: $board
|
||
fi
|
||
|
||
# Add the board to the selected builds
|
||
if [[ "$skip_build" == "0" ]]; then
|
||
echo Add $board to selected_builds
|
||
if [[ "$selected_builds" == "" ]]; then
|
||
selected_builds=$board
|
||
else
|
||
selected_builds=$selected_builds,$board
|
||
fi
|
||
fi
|
||
done
|
||
|
||
# Return the selected builds as JSON Array
|
||
# If Selected Builds is empty: Skip all builds
|
||
echo "selected_builds=[$selected_builds]" | tee -a $GITHUB_OUTPUT
|
||
if [[ "$selected_builds" == "" ]]; then
|
||
echo "skip_all_builds=1" | tee -a $GITHUB_OUTPUT
|
||
fi</code></pre></div><h2 id="skip-the-non-arm-builds"><a class="doc-anchor" href="#skip-the-non-arm-builds">§</a>15.5 Skip The Non-Arm Builds</h2>
|
||
<p>Earlier we saw the code in <em>arch.yml</em> <a href="https://docs.github.com/en/actions/sharing-automations/reusing-workflows"><strong>Reusable Workflow</strong></a> that identifies the builds to be skipped.</p>
|
||
<p>The code above is called by <em>build.yml</em> (Build Workflow). Which will actually skip the builds: <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/build.yml#L119-L148">build.yml</a></p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code># Select the Linux Builds based on PR Arch Label
|
||
Linux-Arch:
|
||
uses: apache/nuttx/.github/workflows/arch.yml@master
|
||
needs: Fetch-Source
|
||
with:
|
||
os: Linux
|
||
boards: |
|
||
[
|
||
"arm-01", "risc-v-01", "sim-01", "xtensa-01", "arm64-01", "x86_64-01", "other",
|
||
"arm-02", "risc-v-02", "sim-02", "xtensa-02",
|
||
"arm-03", "risc-v-03", "sim-03",
|
||
"arm-04", "risc-v-04",
|
||
"arm-05", "risc-v-05",
|
||
"arm-06", "risc-v-06",
|
||
"arm-07", "arm-08", "arm-09", "arm-10", "arm-11", "arm-12", "arm-13", "arm-14"
|
||
]
|
||
|
||
# Run the selected Linux Builds
|
||
Linux:
|
||
needs: Linux-Arch
|
||
if: ${{ needs.Linux-Arch.outputs.skip_all_builds != '1' }}
|
||
runs-on: ubuntu-latest
|
||
env:
|
||
DOCKER_BUILDKIT: 1
|
||
|
||
strategy:
|
||
max-parallel: 12
|
||
matrix:
|
||
boards: ${{ fromJSON(needs.Linux-Arch.outputs.selected_builds) }}
|
||
|
||
steps:
|
||
## Omitted: Run cibuild.sh on Linux</code></pre></div>
|
||
<p>Why <em>“needs: Fetch-Source”</em>? That’s because the PR Labeler runs <strong>concurrently in the background</strong>.</p>
|
||
<p>When we add <em>Fetch-Source</em> as a <strong>Job Dependency</strong>: We give the PR Labeler sufficient time to run (1 min), before we read the PR Label in <em>arch.yml</em>.</p>
|
||
<h2 id="same-for-other-builds"><a class="doc-anchor" href="#same-for-other-builds">§</a>15.6 Same for Other Builds</h2>
|
||
<p>We do the same for Arm64, RISC-V, Simulator, x86_64 and Xtensa: <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/arch.yml#L202-L232">arch.yml</a></p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code># For "Arch / Board: arm64": Build arm64-01
|
||
elif [[ "$arch_contains_arm64" == "1" || "$board_contains_arm64" == "1" ]]; then
|
||
if [[ "$board" != *"arm64-"* ]]; then
|
||
skip_build=1
|
||
fi
|
||
|
||
# For "Arch / Board: risc-v": Build risc-v-01, risc-v-02, ...
|
||
elif [[ "$arch_contains_riscv" == "1" || "$board_contains_riscv" == "1" ]]; then
|
||
if [[ "$board" != *"risc-v-"* ]]; then
|
||
skip_build=1
|
||
fi
|
||
|
||
# For "Arch / Board: simulator": Build sim-01, sim-02
|
||
elif [[ "$arch_contains_sim" == "1" || "$board_contains_sim" == "1" ]]; then
|
||
if [[ "$board" != *"sim-"* ]]; then
|
||
skip_build=1
|
||
fi
|
||
|
||
# For "Arch / Board: x86_64": Build x86_64-01
|
||
elif [[ "$arch_contains_x86_64" == "1" || "$board_contains_x86_64" == "1" ]]; then
|
||
if [[ "$board" != *"x86_64-"* ]]; then
|
||
skip_build=1
|
||
fi
|
||
|
||
# For "Arch / Board: xtensa": Build xtensa-01, xtensa-02
|
||
elif [[ "$arch_contains_xtensa" == "1" || "$board_contains_xtensa" == "1" ]]; then
|
||
if [[ "$board" != *"xtensa-"* ]]; then
|
||
skip_build=1
|
||
fi</code></pre></div>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-macos.jpg" alt="Disable macOS Builds" /></p>
|
||
<h2 id="skip-the-macos-builds"><a class="doc-anchor" href="#skip-the-macos-builds">§</a>15.7 Skip the macOS Builds</h2>
|
||
<p><strong>For Simple PRs and Complex PRs:</strong> We skip the macOS builds <em>(macos, macos/sim-*)</em> since these builds are costly: <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/build.yml#L196-L256">build.yml</a></p>
|
||
<p>(macOS builds will take <strong>2 hours to complete</strong> due to the queueing for macOS Runners)</p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code># Select the macOS Builds based on PR Arch Label
|
||
macOS-Arch:
|
||
uses: apache/nuttx/.github/workflows/arch.yml@master
|
||
needs: Fetch-Source
|
||
with:
|
||
os: macOS
|
||
boards: |
|
||
["macos", "sim-01", "sim-02", "sim-03"]
|
||
|
||
# Run the selected macOS Builds
|
||
macOS:
|
||
permissions:
|
||
contents: none
|
||
runs-on: macos-13
|
||
needs: macOS-Arch
|
||
if: ${{ needs.macOS-Arch.outputs.skip_all_builds != '1' }}
|
||
strategy:
|
||
max-parallel: 2
|
||
matrix:
|
||
boards: ${{ fromJSON(needs.macOS-Arch.outputs.selected_builds) }}
|
||
steps:
|
||
## Omitted: Run cibuild.sh on macOS</code></pre></div>
|
||
<p><em>skip_all_builds</em> for macOS will be set to <code>1</code>: <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/arch.yml#L100-L112">arch.yml</a></p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code># Select the Builds for the PR: arm-01, risc-v-01, xtensa-01, ...
|
||
- name: Select builds
|
||
id: select-builds
|
||
run: |
|
||
|
||
# Skip all macOS Builds
|
||
if [[ "${{ inputs.os }}" == "macOS" ]]; then
|
||
echo "Skipping all macOS Builds"
|
||
echo "skip_all_builds=1" | tee -a $GITHUB_OUTPUT
|
||
exit
|
||
fi</code></pre></div><h2 id="ignore-the-docs-label"><a class="doc-anchor" href="#ignore-the-docs-label">§</a>15.8 Ignore the Docs Label</h2>
|
||
<p>NuttX Devs shouldn’t be <strong>penalised for adding docs</strong>!</p>
|
||
<p>That’s why we ignore the label <em>“Area: Documentation”</em>. Which means that <strong>Simple PR + Docs</strong> is still a Simple PR.</p>
|
||
<p>And will skip the unnecessary builds: <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/arch.yml#L44-L55">arch.yml</a></p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code># Ignore the Label "Area: Documentation", because it won't affect the Build Targets
|
||
query='.labels | map(select(.name != "Area: Documentation")) | '
|
||
select_name='.[].name'
|
||
select_length='length'
|
||
|
||
# Get the Labels for the PR: "Arch: risc-v \n Board: risc-v \n Size: XS"
|
||
# If GitHub CLI Fails: Build all targets
|
||
labels=$(gh pr view $pr --repo $GITHUB_REPOSITORY --json labels --jq $query$select_name || echo "")
|
||
numlabels=$(gh pr view $pr --repo $GITHUB_REPOSITORY --json labels --jq $query$select_length || echo "")
|
||
echo "numlabels=$numlabels" | tee -a $GITHUB_OUTPUT</code></pre></div><h2 id="sync-to-nuttx-apps"><a class="doc-anchor" href="#sync-to-nuttx-apps">§</a>15.9 Sync to NuttX Apps</h2>
|
||
<p>Remember to sync <em>build.yml</em> and <em>arch.yml</em> from <strong>NuttX Repo to NuttX Apps</strong>!</p>
|
||
<p><a href="https://github.com/apache/nuttx-apps/pull/2676">(See the <strong>Pull Request</strong>)</a></p>
|
||
<p><em>How are they connected?</em></p>
|
||
<ul>
|
||
<li>
|
||
<p><em>build.yml</em> points to <em>arch.yml</em> for the <strong>Build Rules</strong>.</p>
|
||
<p>When we sync <em>build.yml</em> from NuttX Repo to NuttX Apps, we won’t need to remove the references to <em>arch.yml</em>.</p>
|
||
</li>
|
||
<li>
|
||
<p>We could make <em>nuttx-apps/build.yml</em> point to <em>nuttx/arch.yml</em>.</p>
|
||
<p>But that would make the <strong>CI Fragile</strong>: Changes to <em>nuttx/arch.yml</em> might cause <em>nuttx-apps/build.yml</em> to break.</p>
|
||
</li>
|
||
<li>
|
||
<p>That’s why we point <em>nuttx-apps/build.yml</em> to <em>nuttx-apps/arch.yml</em> instead.</p>
|
||
</li>
|
||
</ul>
|
||
<p><em>But NuttX Apps don’t need Build Rules?</em></p>
|
||
<ul>
|
||
<li>
|
||
<p><em>arch.yml</em> is kinda redundant in NuttX Apps. Everything is a <strong>Complex PR</strong>!</p>
|
||
</li>
|
||
<li>
|
||
<p>I have difficulty keeping <em>nuttx/build.yml</em> and <em>nuttx-apps/build.yml</em> in sync. That’s why I simply copied over <em>arch.yml</em> as-is.</p>
|
||
</li>
|
||
<li>
|
||
<p>In future we could extend <em>arch.yml</em> with <strong>App-Specific</strong> Build Ruiles</p>
|
||
</li>
|
||
</ul>
|
||
<p><em>CI Build Workflow looks very different now?</em></p>
|
||
<p>Yeah our <strong>CI Build Workflow</strong> used to be simpler: <a href="https://github.com/apache/nuttx/blob/6a0c0722e23f5fc294a4574111742765e8c0dd04/.github/workflows/build.yml#L117-L179">build.yml</a></p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code>Linux:
|
||
needs: Fetch-Source
|
||
strategy:
|
||
matrix:
|
||
boards: [arm-01, arm-02, arm-03, arm-04, arm-05, arm-06, arm-07, arm-08, arm-09, arm-10, arm-11, arm-12, arm-13, other, risc-v-01, risc-v-02, sim-01, sim-02, xtensa-01, xtensa-02]</code></pre></div>
|
||
<p>Now with <strong>Build Rules</strong>, it becomes more complicated: <a href="https://github.com/apache/nuttx/blob/master/.github/workflows/build.yml#L118-L196">build.yml</a></p>
|
||
<div class="example-wrap"><pre class="language-yaml"><code># Select the Linux Builds based on PR Arch Label
|
||
Linux-Arch:
|
||
uses: apache/nuttx-apps/.github/workflows/arch.yml@master
|
||
needs: Fetch-Source
|
||
with:
|
||
boards: |
|
||
[
|
||
"arm-01", "other", "risc-v-01", "sim-01", "xtensa-01", ...
|
||
]
|
||
|
||
# Run the selected Linux Builds
|
||
Linux:
|
||
needs: Linux-Arch
|
||
if: ${{ needs.Linux-Arch.outputs.skip_all_builds != '1' }}
|
||
strategy:
|
||
matrix:
|
||
boards: ${{ fromJSON(needs.Linux-Arch.outputs.selected_builds) }}</code></pre></div>
|
||
<p>One thing remains the same: We configure the <strong>Target Groups</strong> in <em>build.yml</em>. (Instead of <em>arch.yml</em>)</p>
|
||
<h2 id="actual-performance"><a class="doc-anchor" href="#actual-performance">§</a>15.10 Actual Performance</h2>
|
||
<p>For our Initial Implementation of Build Rules: We recorded the <strong>CI Build Performance</strong> for Simple PRs.</p>
|
||
<p>Then we made the Simple PRs faster…</p>
|
||
<div><table><thead><tr><th style="text-align: left">Build Time</th><th style="text-align: center">Before</th><th style="text-align: center">After</th></tr></thead><tbody>
|
||
<tr><td style="text-align: left">Arm32</td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11210724531"><strong>2 hours</strong></a></td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11707495067"><strong>1.5 hours</strong></a></td></tr>
|
||
<tr><td style="text-align: left">Arm64</td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11140028404"><strong>2.2 hours</strong></a></td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11704164434"><strong>30 mins</strong></a></td></tr>
|
||
<tr><td style="text-align: left">RISC-V</td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11163805578"><strong>1.8 hours</strong></a></td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11669727849"><strong>50 mins</strong></a></td></tr>
|
||
<tr><td style="text-align: left">Xtensa</td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11105657530"><strong>2.2 hours</strong></a></td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11699279596"><strong>1.5 hours</strong></a></td></tr>
|
||
<tr><td style="text-align: left">x86_64</td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11158309196"><strong>2.2 hours</strong></a></td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11661703226"><strong>10 mins</strong></a></td></tr>
|
||
<tr><td style="text-align: left">Simulator</td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11146942454"><strong>2.2 hours</strong></a></td><td style="text-align: center"><a href="https://github.com/apache/nuttx/actions/runs/11499427672"><strong>1 hour</strong></a></td></tr>
|
||
</tbody></table>
|
||
</div>
|
||
<p><em>How did we make the Simple PRs faster?</em></p>
|
||
<ul>
|
||
<li>
|
||
<p>We broke up <strong>Big Jobs</strong> <em>(arm-05, riscv-01, riscv-02)</em> into Multiple Smaller Jobs.</p>
|
||
<p><strong>Small Jobs</strong> will really fly! <a href="https://docs.google.com/spreadsheets/d/1OdBxe30Sw3yhH0PyZtgmefelOL56fA6p26vMgHV0MRY/edit?gid=0#gid=0">(See the Build Job Details)</a></p>
|
||
<p>(We moved the RP2040 jobs from <em>arm-05</em> to <em>arm-06</em>, then added <em>arm-14</em>. Followed by jobs <em>riscv-03 … riscv-06</em>)</p>
|
||
</li>
|
||
<li>
|
||
<p>We saw a <strong>27% Reduction in GitHub Runner Hours</strong>! From <a href="https://github.com/apache/nuttx/actions/runs/11210724531/usage"><strong>15 Runner Hours</strong></a> down to <a href="https://github.com/apache/nuttx/actions/runs/11217886131/usage"><strong>11 Runner Hours</strong></a> per Arm32 Build.</p>
|
||
</li>
|
||
<li>
|
||
<p>We split the <strong>Board Labels</strong> according to Arch, like <em>“Board: arm”</em>.</p>
|
||
<p>Thus <em>“Board: arm”</em> should build the exact same way as <em>“Arch: arm”</em>.</p>
|
||
<p>Same for <em>“Board: arm, Arch: arm”</em>. We updated the Build Rules to use the Board Labels.</p>
|
||
</li>
|
||
<li>
|
||
<p>We split the <em>others</em> job into <em>arm64</em> and <em>x86_64</em></p>
|
||
</li>
|
||
</ul>
|
||
<p><strong>Up Next:</strong> Reorg and rename the CI Build Jobs, for better performance and easier maintenance. But how?</p>
|
||
<ul>
|
||
<li>
|
||
<p>I have a hunch that CI works better when we pack the jobs into <strong>One-Hour Time Slices</strong></p>
|
||
</li>
|
||
<li>
|
||
<p>Kinda like packing yummy goodies into <strong>Bento Boxes</strong>, making sure they don’t overflow the Time Boxes :-)</p>
|
||
</li>
|
||
<li>
|
||
<p>We should probably shift the <strong>Riskiest / Most Failure Prone</strong> builds into the First Build Job <em>(arm-00, risc-v-00, sim-00)</em>.</p>
|
||
<p>And we shall <strong>Fail Faster</strong> (in case of problems), skipping the rest of the jobs.</p>
|
||
</li>
|
||
<li>
|
||
<p>Recently we see many builds for <a href="https://github.com/apache/nuttx/pulls?q=is%3Apr+is%3Aclosed+goldfish+"><strong>Arm32 Goldfish</strong></a>.</p>
|
||
<p>Can we limit the builds to the <strong>Goldfish Boards</strong> only?</p>
|
||
<p>To identify <strong>Goldfish PRs</strong>, we can label the PRs like this: <em>“Arch: arm, SubArch: goldfish”</em> and <em>“Board: arm, SubBoard: goldfish”</em></p>
|
||
</li>
|
||
<li>
|
||
<p>Instead of Building an <strong>Entire Arch</strong> <em>(arm-01)</em>…</p>
|
||
<p>Can we build <strong>One Single SubArch</strong> <em>(stm32)</em>?</p>
|
||
<p>How will we <strong>Filter the Build Jobs</strong> (e.g. <em>arm-01</em>) that should be built for a SubArch (e.g. <em>stm32</em>)? <a href="https://gist.github.com/lupyuen/bccd1ac260603a2e3cd7440b8b4ee86c">(Maybe like this)</a></p>
|
||
<p><a href="https://github.com/apache/nuttx/issues/13775">(Discussion here)</a></p>
|
||
</li>
|
||
</ul>
|
||
<p><img src="https://lupyuen.github.io/images/ci3-hike.jpg" alt="Spot the exact knotty moment that we were told about the CI Shutdown" /></p>
|
||
<p><a href="https://www.strava.com/activities/12673094079"><em>Spot the exact knotty moment that we were told about the CI Shutdown</em></a></p>
|
||
|
||
|
||
<!-- Begin scripts/rustdoc-after.html: Post-HTML for Custom Markdown files processed by rustdoc, like chip8.md -->
|
||
|
||
<!-- Begin Theme Picker and Prism Theme -->
|
||
<script src="../theme.js"></script>
|
||
<script src="../prism.js"></script>
|
||
<!-- Theme Picker and Prism Theme -->
|
||
|
||
<!-- End scripts/rustdoc-after.html -->
|
||
|
||
|
||
</body>
|
||
</html> |