At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

School of SRE

In early 2019, we started visiting campuses across India to recruit the best and brightest minds to ensure LinkedIn, and all the services that make up its complex technology stack, is always available for everyone. This critical function at LinkedIn falls under the purview of the Site Engineering team and Site Reliability Engineers (SREs) who are Software Engineers specializing in reliability. SREs apply the principles of computer science and engineering to the design, development and operation of computer systems: generally, large scale, distributed ones

As we continued on this journey we started getting a lot of questions from these campuses on what exactly the site reliability engineering role entails? And, how could someone learn the skills and the disciplines involved to become a successful site reliability engineer? Fast forward a few months, and a few of these campus students had joined LinkedIn either as interns or as full-time engineers to become a part of the Site Engineering team; we also had a few lateral hires who joined our organization who were not from a traditional SRE background. That's when a few of us got together and started to think about how we can onboard new graduate engineers to the Site Engineering team.

There is a vast amount of resources scattered throughout the web on what the roles and responsibilities of SREs are, how to monitor site health, production incidents, define SLO/SLI etc. But there are very few resources out there guiding someone on the basic skill sets one has to acquire as a beginner. Because of the lack of these resources, we felt that individuals have a tough time getting into open positions in the industry. We created the School Of SRE as a starting point for anyone wanting to build their career as an SRE.

In this course, we are focusing on building strong foundational skills. The course is structured in a way to provide more real life examples and how learning each of these topics can play an important role in day to day SRE life. Currently we are covering the following topics under the School Of SRE:

We believe continuous learning will help in acquiring deeper knowledge and competencies in order to expand your skill sets, every module has added references which could be a guide for further learning. Our hope is that by going through these modules we should be able to build the essential skills required for a Site Reliability Engineer.

At LinkedIn, we are using this curriculum for onboarding our non-traditional hires and new college grads into the SRE role. We had multiple rounds of successful onboarding experience with new employees and the course helped them be productive in a very short period of time. This motivated us to open source the content for helping other organizations in onboarding new engineers into the role and provide guidance for aspiring individuals to get into the role. We realize that the initial content we created is just a starting point and we hope that the community can help in the journey of refining and expanding the content. Checkout the contributing guide to get started.

Comments
  • Commands for Viewing Files

    Commands for Viewing Files

    In addition to CAT, HEAD, and TAIL commands. We should add LESS and MORE too as these are commonly used in everyday cases and also provide faster access.

  • Suggestion to improve readability of shell commands

    Suggestion to improve readability of shell commands

    I was looking at the shell commands here and I found the user prompt very distracting. It currently looks like this.

    spatel1-mn1:school-of-sre spatel1$ git branch b1
    spatel1-mn1:school-of-sre spatel1$ git log --oneline --graph
    * 7f3b00e (HEAD -> master, b1) adding file 2
    * df2fb7a adding file 1
    

    but it would be better in my opinion if it looked like this

    $ git branch b1
    $ git log --oneline --graph
    * 7f3b00e (HEAD -> master, b1) adding file 2
    * df2fb7a adding file 1
    

    or since the current branch important to understand what a git command would do.

    (master)$ git branch b1
    (master)$ git log --oneline --graph
    * 7f3b00e (HEAD -> master, b1) adding file 2
    * df2fb7a adding file 1
    

    I think this is better and easier to understand since the git commands stand out

    What do you think?

    I'm also willing to make a PR for this if this sounds like a reasonable idea.

    Thanks for sharing free knowledge, I wish I came across this when I was in uni. Cheers!

  • Add observability

    Add observability

    what do you think about adding an observability course?

    the definition? what is a metric? how to get / store / correlate and show? why? Tools eg: prometheus, grafana, elk, jaeger, etc.

  • GNU  deserves to be included in Linux Introduction

    GNU deserves to be included in Linux Introduction

    Sad to see, not mentioning Linux as GNU/Linux at least at the very beginning.

    Please do something about this or assign this to me.

    Thank you.

    Best, Rav

  • Reorder the linux file system information to reflect the same order

    Reorder the linux file system information to reflect the same order

    Why

    The order of directory in the Linux file system organization image and text under the image do not match. Along with that /root directory information is not provided.

    As a new user tries to build mkdocs, they are getting an error. This will fix it.

    Screen Shot 2022-09-23 at 4 24 30 PM

    What is changing

    • By reordering the file system to match the order in the picture makes it more user readable. And /root is added.
    • mkdocs build command fails with the error message AttributeError: module 'jinja2' has no attribute 'contextfilter'. jinja2 added to requirement.txt and set to 3.0.3
    • Multiple "and" in the What are Linux operating systems section.

    Changes

    Screen Shot 2022-09-23 at 4 23 44 PM
  • word 'swimlane' in diagram

    word 'swimlane' in diagram

    Image https://linkedin.github.io/school-of-sre/systems_design/images/swimlane-1.jpg

    on this page https://linkedin.github.io/school-of-sre/systems_design/fault-tolerance/

    The word swimlane in the diagram is likely extraneous. Please consider revising the image. Thank you

  • Katacoda.com is now closed

    Katacoda.com is now closed

    Hello, I really love how helpful this manual is and while I was going through it I found out there were some mentions of katacoda.com for hands-on labs in the kubernetes section. Katacoda.com is now closed, actually it has become O'Reilly exclusive. References:

    • https://www.katacoda.com/courses/kubernetes/playground
    • https://www.oreilly.com/online-learning/leveraging-katacoda-technology.html

    An alternative to this can be the labs at play with kubernetes: https://labs.play-with-k8s.com/

    Changes should be done primarily in the file: orchestration_with_kubernetes.md

    If approved, I would like to update this and send in a PR if allowed Cheers!

  • Upgrade mkdocs-material to add dark mode.

    Upgrade mkdocs-material to add dark mode.

    Issue : https://github.com/mkdocs/mkdocs/issues/2799 Tweet: https://twitter.com/readthedocs/status/1507388916013314048

    Fix : Pin jinja2 < 3.1.0 till mkdocs version is upgraded.

  • Minor issue with redirection to GFG site and a typo

    Minor issue with redirection to GFG site and a typo

    Inside school-of-sre\courses\level102\containerization_and_orchestration\intro_to_containers.md Line 177, clicking on the link in line "If you want to try out a more in-depth exercise on cgroups, check out this tutorial from Geeks for Geeks." makes the site go Error 404 Not found. To fix this an 'https://' can be added as a prefix to the link which will fix this issue. Page Link: https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/

    Inside school-of-sre\courses\level102\linux_intermediate\bashscripting.md the monitoring script misses onto the first '#' in #!/bin/bash and should be #!/bin/bash instead of !/bin/bash Page Link: https://linkedin.github.io/school-of-sre/level102/linux_intermediate/bashscripting/

    Do let me know about the issues and suggested changes...

  • Added shell scripting course under Linux Basics, Fundamentals Series

    Added shell scripting course under Linux Basics, Fundamentals Series

    Added a shell scripting course under Linux Basics (Fundamentals Series). Shell scripts are widely used by SREs to automate/schedule repeatable tasks. This course will impart the fundamentals of shell scripting, variables, functions, loops and arguments. These concepts can be implemented to create custom scripts. (Examples have been included). Thanks, Prasanjit

  • Updated the position of course content section

    Updated the position of course content section

    I moved the "Course Content" section below the "Prerequisites", and moved the title "Introduction" below "Prerequisites".

    Fixes issue #56

  • Consistency and CAP theorem in System Design

    Consistency and CAP theorem in System Design

    Currently, we have Availability, Fault Tolerance in Level 101 but nothing about Consistency which is also an important topic.

    With these three in the picture, we can also have the CAP theorem. Maybe under Level 102

  • Add more topics to Signal

    Add more topics to Signal

    Add more topics to Signal under Linux Advanced. Refer #136

    • Signal Groups: realtime and standard signals.
    • Signal Overview
    • improved few terms and sentences
  • Add SOS to Up-For-Grabs.net

    Add SOS to Up-For-Grabs.net

    It would be good to add this repo as a project list to https://up-for-grabs.net/#/ for the people who are new to OpenSource. A lot of people have great skills and knowledge but they start late into Open Source due to which it is hard for them to contribute initially. SOS is a good place where they can contribute their knowledge and skills without much Open Source knowledge.

    Let me know if this is good to be added and if someone is going to do that? or want me to?

  • how to integrate other languages version

    how to integrate other languages version

    Hi folks, I've been translating this into Chinese, and have almost finished all the courses in level101 in my branch. I'll also continue to translate more courses. I just wonder if there's any guide about how to integrate the course in other languages to the current site?

    Thanks!

  • Linux Networking Fundamentals needs work

    Linux Networking Fundamentals needs work

    In the intro.md prerequisites, it's stated that readers need to have knowledge of jargon in the TCP/IP stack such as DNS, HTTP, etc. It's not clear if these protocols are the jargon being referred to, or jargon associated with these protocols. Anyway, I would suggest that rather than worrying about jargon, readers should have a grounding in the basic principles of these protocols. I recommend providing a link to Peterson and Davie's Computer Networks: A Systems Approach, an open-source textbook covering the fundamentals of computer networking from a multi-layered, system of components perspective. I believe after reading (at least) the first three chapters, your readers will have enough of a grasp of computer networking fundamentals that they will better understand how to perform SRE networking tasks on Linux (and other) systems.

Implied Role Assignment for Kubernetes
Implied Role Assignment for Kubernetes

Implied Role Assignment for Kubernetes 1. Vision and Goals of the Project The RBAC (role-based access control) model needs to support improved delegat

Nov 6, 2021
Our library to interact with shopware6.

gosw6 Here you can find our library for shopware 6. We develop the API endpoints according to our demand and need. You are welcome to help us to furth

Sep 28, 2022
Kubernetes controller for backing up public container images to our own registry repository

image-clone-controller Kubernetes controller which watches applications (Deployment and DaemonSet) and "caches" the images (public container images) b

Aug 28, 2022
Hexagonal architecture paradigms, such as dividing adapters into primary (driver) and secondary (driven)Hexagonal architecture paradigms, such as dividing adapters into primary (driver) and secondary (driven)

authorizer Architecture In this project, I tried to apply hexagonal architecture paradigms, such as dividing adapters into primary (driver) and second

Dec 7, 2021
KubeCube is an open source enterprise-level container platform
KubeCube is an open source enterprise-level container platform

KubeCube English | 中文文档 KubeCube is an open source enterprise-level container platform that provides enterprises with visualized management of Kuberne

Jan 4, 2023
HBase Exporter,fetch data from jmx for region-level data.

HBase Exporter Prometheus exporter for HBase which fetch data from hbase jmx, written in Go. You can even see region-level metrics. Installation and U

Nov 4, 2022
How to build production-level services in Go leveraging the power of Kubernetes

Ultimate Service Copyright 2018, 2019, 2020, 2021, Ardan Labs [email protected] Ultimate Service 3.0 Classes This class teaches how to build producti

Oct 22, 2021
LLS-Exporter exports fuel level sensor data (rs-485 lls protocol) as prometheus metrics

LLS Exporter LLS Exporter reads rs485/rs232 data from serial port, decodes lls protocol and exports fuel level sensor data as prometheus metrics. Lice

Dec 14, 2021
Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe energy related system stats and exports as Prometheus metrics
Kepler (Kubernetes-based Efficient Power Level Exporter) uses eBPF to probe energy related system stats and exports as Prometheus metrics

kepler Kepler (Kubernetes Efficient Power Level Exporter) uses eBPF to probe energy related system stats and exports as Prometheus metrics Architectur

Dec 26, 2022
k8s-image-swapper Mirror images into your own registry and swap image references automatically.
k8s-image-swapper Mirror images into your own registry and swap image references automatically.

k8s-image-swapper Mirror images into your own registry and swap image references automatically. k8s-image-swapper is a mutating webhook for Kubernetes

Dec 27, 2022
Translate Prometheus Alerts into Kubernetes pod readiness

prometheus-alert-readiness Translates firing Prometheus alerts into a Kubernetes readiness path. Why? By running this container in a singleton deploym

Oct 31, 2022
Vilicus is an open source tool that orchestrates security scans of container images(docker/oci) and centralizes all results into a database for further analysis and metrics.
Vilicus is an open source tool that orchestrates security scans of container images(docker/oci) and centralizes all results into a database for further analysis and metrics.

Vilicus Table of Contents Overview How does it work? Architecture Development Run deployment manually Usage Example of analysis Overview Vilicus is an

Dec 6, 2022
A k8s vault webhook is a Kubernetes webhook that can inject secrets into Kubernetes resources by connecting to multiple secret managers
A k8s vault webhook is a Kubernetes webhook that can inject secrets into Kubernetes resources by connecting to multiple secret managers

k8s-vault-webhook is a Kubernetes admission webhook which listen for the events related to Kubernetes resources for injecting secret directly from sec

Oct 15, 2022
A helm v3 plugin to adopt existing k8s resources into a new generated helm chart

helm-adopt Overview helm-adopt is a helm plugin to adopt existing k8s resources into a new generated helm chart, the idea behind the plugin was inspir

Dec 15, 2022
A Kubernetes Operator, that helps DevOps team accelerate their journey into the cloud and K8s.
A Kubernetes Operator, that helps DevOps team accelerate their journey into the cloud and K8s.

A Kubernetes Operator, that helps DevOps team accelerate their journey into the cloud and K8s. OAM operator scaffolds all of the code required to create resources across various cloud provides, which includes both K8s and Non-K8s resources

Nov 30, 2021
A set of components that can be composed into a highly available metric system with unlimited storage capacity
A set of components that can be composed into a highly available metric system with unlimited storage capacity

Overview Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added

Oct 20, 2021
k6-to-honeycomb is a program that sends k6 results into Honeycomb for visualization and analysis.
k6-to-honeycomb is a program that sends k6 results into Honeycomb for visualization and analysis.

k6-to-honeycomb k6-to-honeycomb is a program that sends k6 results into Honeycomb for visualization and analysis. Getting Started k6-to-honeycomb is a

Jul 14, 2022
GitHub Action: Compose multiple (conditional) checks into a single check based on file paths in a pull request
GitHub Action: Compose multiple (conditional) checks into a single check based on file paths in a pull request

GitHub Action: Composite Example Usage --- name: All Checks on: pull_request: branches: - main jobs: meta: runs-on: - ubuntu-20.

Dec 29, 2022