Docker and Kubernetes One-Pager

March 27, 2023

one-pagers docker

Containers
Kubernetes

Containers

What is a container?

A container is a unit of deployment.
It contains the code, the runtime, the system tools and system libraries.

Why containers?

Move faster by deploying smaller units.
Use fewer resources.
Fit more into the same host.
You can run them anywhere.
Isolated - If one fails, it wont take the whole system with it.

Things to remember

Containers are ephemerous and stateless.
You don’t usually store data in containers.
When containers are destroyed, so is the data inside them.
Store data outside the container using a volume.
A volume maps a folder on the host to a logical folder in the container.

Common Docker Commands

docker info - to display system information.
docker version - to display system version.
docker login - to log in to a Docker registry.
docker pull [imagename] - to pull an image from a registry.
docker run [imagename] - to run containers.
docker run -d [imagename] - to run in detached mode.
docker start [containerName] - to start stopped containers.
docker ps - to list running containers.
docker ps -a - to list running and stopped containers.
docker stop [containername] - to stop containers.
docker kill [containername] - to kill containers.
docker image inspect [imagename] - to get image info.
docker run -it nginx — /bin/bash - to attach shell.
docker run -it --microsoft/powershell:nanoserver pwsh.exe - to attach powershell.
docker container exec -it [containername] — bash - to attach to a running container.
docker rm [containername] - to remove stopped containers.
docker rm $(docker ps -a -q) - to remove all stopped containers.
docker images - to lists images.
docker rmi [imagename] - to deletes the images.
docker system prune -a - to remove all images not in use by any containers.
docker build -t [name:tag] . - to build an image using a Dockerfile located in the same folder.
docker build -t [name:tag] -f [filename] - to build an image using a Dockerfile located in a different folder.
docker tag [imagename][name:tag] -to tag an existing image.
docker create volume [volumename] - to create a new volume.
docker volume ls - to list the volumes.
docker volume inspect [volumeName] - to display the volume info.
docker volume rm [volumeName] - to delete a volume.
docker volume prune - to delete all volumes not mounted.

Commands to map a volume

docker volume create myvol - to create a volume.
docker volume inspect myvol - to inspect the volume.
docker volume ls - to list the volumes.
docker run -d --name devtest -v myvol:/app nginx:latest - to run a container with the volume mapped to the /app logical folder. The code will see it as any regular folder.
docker run -d --name devtest -v d:/test/:/app nginx:latest - maps the logical folder /app to d:/test/.
docker inspect devtest - to inspect the volume path.

Common Docker Compose Commands

docker compose build - to build the images.
docker compose start - to start the containers.
docker compose stop - to stop the containers.
docker compose up -d - to build and start.
docker compose ps - to list what’s running.
docker compose rm - to remove from memory.
docker compose down - to stop and remove.
docker compose logs - to get the logs.
docker compose exec [container] bash - to run a command in a container.
docker compose --project-name test1 up -d - to run an instance as a project. You can use projectname to launch a second instance of your application from the project folder.
docker compose -p test2 up -d - shortcut of the above command.
docker compose ls - to list running projects.
docker compose cp [containerID]:[src_path] [dest_path] - to copy files from the container.
docker compose cp [src_path] [containerId]:[dest_path] - to copy files to the container.

Restart Policy

no
- The default restart policy.
- Does not restart a container under any circumstances.
always
- Always restarts the container until its removal.
on-failure
- Restarts a container if the exit code indicates an error.
unless-stopped
- Restarts a container irrespective of the exit code but will stop restarting when the services is stopped or removed.

Kubernetes

Process Isolation

A container runtime provides low-level functionality to run processes in a container.
Container runtimes use multiple types of linux kernel namespaces to give each container an isolated view of the system.
A running container appears like a completely separate system, with its own hostname, network, processes, and filesystem.
A container has multiple kinds of isolation
- Mounted filesystems
- Hostname and domain name
- Interprocess communication
- Process identifiers
- Network devices
Docker uses containerd as the container runtime.
Container runtimes are low-level libraries and do not provide a user interface to work with but provide an api to work with and test with.
containerd provides the ctr tool for testing.

Resource Limiting

It is important to limit the resources (cpu, memory,network) a process consumes as it can starve other processes from running correctly.
Scheduler
- In the linux kernel, a scheduler keeps a list of all the processes.
- It tracks which processes are ready to run and how much time each process has received.
- It is designed to be as fair as possible.
- Every process gets a chance to run.
- However, you can change their priorities using policy.
Control groups
- To manage container use of CPU cores, control groups are used.
- Control groups (cgroups) are a feature of the linux kernel that manage process resource utilization.
- Each resource type, such as CPU, memory, or a block device, can have an entire heirarchy of cgroups associated with it.
- After a process is in cgroup, the kernel automatically applies the controls from that group.
- Docker supports cpu limits on containers, so it creates a cgroup for each container it runs.
- The cgroup that docker creates is named after the container id.

Network namespaces

Linux network namespaces are used to make each container appear to have its own set of network interfaces, complete with seperate IP addresses and ports.
Kubernetes adds an "overlay" network through which containers can communicate even when they are running on different hosts.
Why network isolation?
- It is common to run a process as a webserver.
- In such cases, it is necessary to choose a network interface with a port number.
- Two processes cannot listen to the same port.
- New processes can show up at any time and it is not possible to know which port number they are using for communication.
- To get around this, a seperate virtual network interface is provided for each container.
- This way, a process in a container can choose any port it wants.
- It will be listening on a network interface that is separate from a process in a separate container.

Isolated Storage

A running container has its own filesystem. Two running containers built from the same image have their own filesystem. This is made possible by the overlay filesystem.
An overlay file system has 3 parts:-
- The lower directory is where the "base" layer exists.
- The upper directory has the "overlay" layer.
- The mount directory is where the unified filesystem is made available for use.
Overlay filesystems are provided by a linux kernel module, enabling very high performance.
Practical Image Building Advice
- An overlay filesystem can have multiple lower directories.
- Merging these directories is performant.
- Breaking container image into multiple layers causes very little performance penalty.
- Enable reuse of layers.
- In general, you should assume that every line of Dockerfile makes a new layer.
- You should also assume that information about every command executed is stored in the image metadata. As a result:-
  - Perform multiple steps in a single RUN line, and make sure every run command cleans up after itself.
  - Don’t use COPY to transfer large files or secrets into the image, even if you clean them up in a later RUN step.
  - Don’t use ENV to store secrets, because the resulting values become part of the image metadata.

Cross Cutting Concerns

All container orchestration software, kubernetes included, have to account for the following:-
- Dynamic Scheduling
- Distributed State
- Multitenancy
- Hardware Isolation
Dynamic Scheduling means that new containers must be allocated to a server, and allocations can change due to configuration changes or failures.
Distributed State means the entire cluster must keep information about what containers are running and where, even during hardware and network failures.
Multitenancy means that multiple applications should be able to run in a single cluster, with isolation for security and reliability.
Hardware Isolation means clusters must be able to run in cloud environments and on regular servers.

Container Deployments to Kubernetes

Kubernetes runs containers on a set of worker machines called nodes.
Each node runs a kubelet service that interfaces with the underlying container runtime to start and monitor containers.
Kubernetes has a set of software components that manage worker nodes and their containers. They are deployed separately from the worker nodes. Together they are referred to as the control plane.
This seperation allows customization of each kubernetes cluster.
The Api server is a critical component of the kubernetes control plane. It provides an interface for cluster control and monitoring that other cluster users and control plane components use.
3 control plane nodes is the smallest number required to run a highly available cluster.
kubectl describe secret -n kube-system microk8s-dashboard-token
kubectl get all --all-namespaces

Pods

A pod is the most basic resource in kubernetes and is how we run containers.
Each pod can have one or more containers within it.
The pod is used to provide process isolation.
Linux kernel namespaces are used at the Pod and the container level.
- mnt - Mount points: Each container has its own root filesystem; other mounts are available to all containers in the Pod.
- uts Unix time sharing: isolated at the Pod level.
- ipc Interprocess communication: isolated at the Pod
- pid Process identifiers: isolated at the container level
- net Network: isolated at pod level
The advantage of this approach is that multiple containers can act like processes on the same virtual host, using localhost address to communicate, while still being based on separate container images.

Following is an example of a yaml file to create a pod running NGINX.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx

Specifying Pod as the kind tells Kubernetes exactly what resource we’re creating.
The namespace is not specified in the metadata, so by default this pod will be deployed in the default namespace.
kubectl apply -f /opt/nginx-pod.yaml - adds a pod to the cluster.
kubectl get pods -o wide - gives the status of the pods that have been deployed. -o wide shows the IP address and the node assignment of the pod.
kubectl describe pod nginx - helps see the status and event log for a cluster node. Since no namespace is specified, the default namespace will be used.
The event log is the first place to look for issues when there are problems starting a container.
kubectl logs nginx - displays the logs. This command always refers to a Pod because Pods are the basic resource used to run containers.
kubectl logs is the place to look if a container is pulled and started successfully but then crashes.
kubectl delete -f nginx-pod.yml - will delete the pod.
If there are multiple resources defined in the yml, kubectl will delete all the resources. It uses the resource name to perform the delete.

Deployments

Creating a pod directly does not provide the scalability and failover benefits that Kubernetes offers.
The pod gets allocated to a node only on creation, with no-reallocation in case the node fails.
To get scalability and failover, we need to create a controller to manage the Pod for us.
The most common controller is the Deployment.
To create a deployment, we provide a Pod template.
The deployment creates the pods matching the template with the help of a ReplicaSet.
ReplicaSets are responsible for basic Pod management, including monitoring Pod status and performing failover.
Deployments are responsible for tracking changes to the Pod template caused by configuration changes or container image updates.
Deployments create ReplicaSets so we only need to interact with Deployments and not ReplicaSets.

Following is a sample deployment yaml file:-

kind: Deployment
apiVersion: apps/v1
metadata:
  name: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
      app: nginx
    spec:
    containers:
    - name: nginx
      image: nginx
        resources:
        requests:
          cpu: "100m"

We must provide a unique name for this deployment.
The replicas field tells kubernetes how many identical instances of the Pod we want running.
The selector is used to enable the Deployment to find its Pods.
The content of matchLabels must exactly match the content in the template.metadata.labels field or kubernetes will reject the Deployment.
The content of template.spec will be used as the spec for any Pods created by this Deployment.
We add a CPU resource request so that we can configure autoscaling later on.
kubectl apply -f nginx-deploy.yml creates the deployment.
kubectl get deployment nginx - tracks the status of the deployment.
When the deployment is complete, 3 nginx pods will be ready and available and managed by this deployment.
The name of each Pod begins with the name of the Deployment, followed by a few random characters that represent the ReplicaSet, followed by a few random characters that identify the pod.
kubectl get replicasets - displays the replicasets created.
The deployment does not use the names of the pods to match itself to the pods, it uses its selector to match the labels on the pod.
kubectl get all -l app=ngnix displays all the resources whose label matches app=nginx.
This design helps kubernetes to be self-healing.
In case of a failure or a network split, Kubernetes must be able to look at the current state of the pods and determine how to achieve the desired state. This means that if any additional pods are created by the deployment needs to be shut down or a new pod needs to be started to maintain the replica set. Using a selector avoids the need for the Deployment to remember all the Pods it has created, even Pods on failed nodes.
The Deployment is monitoring the pods and will maintain the correct number of replicas, if we delete a pod, it will be automatically re-created.
kubectl delete pod <podname> deletes a pod.
If we change the number of replicas for the deployment, pods are automatically updated.
kubectl scale --replicas=4 deployment nginx updates the number of replicas in a deployment named nginx.
Instead of manually scaling, we can also configure auto scaling based on the load using HorizontalPodAutoscaler.
To configure autoscaling, we create a new resource with a reference to our deployment.
The cluster then monitors resources used by the Pods and reconfigures the Deployment as needed.

kubectl autoscale command can be used to configure autoscaling.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  scaleTargetRef:
  apiVersion: apps/v1
  kind: Deployment
  name: nginx
  minReplicas: 1
 maxReplicas: 10
 metrics:
  - type: Resource
    resource:
      name: cpu
      target:
       type: Utilization
       averageUtilization: 50

Here also in the metadata section we add the label app:nginx.
This style of tagging the components of an application with consistent metadata is a good practice to help others understand what resources go together and to make debugging easier.
This configuration uses version 2 of the autoscaler configuration.
Version 2 of the autoscaler supports multiple resources.
Each resources is used to calculate a vote on the desired number of Pods, and the largest number wins.
The target for the autoscaler is specified using its Api resource group, kind, and name, which is enough to uniquely identify any resource in a kubernetes cluster.
We then tell the autoscaler to monitor the CPU utilization of the Pods that belong to the Deployment.
The autoscaler will work to keep the cpu utilization at 50 percent over the long run, scaling up or down as required.
However, the replicas will never be more than 10 and will never be less than 1.
kubectl apply -f nginx-scaler.yml deploys the autoscaler
kubectl get hpa displays the autoscaler’s target reference, the current and desired resource utilization, and the maximum, minimum, and current number of replicas. hpa is an abbrev for horizontalpodautoscaler.

Jobs and CronJobs

For cases where we need to run a command once or on schedule, we can use a Job.
A Deployment ensures that any container that stops running is restarted.

A Job can check the exit code of the main process and restart only if the exit code is non-zero, indicating failure.

apiVersion: batch/v1
kind: Job
metadata:
 name: sleep
spec:
 template:
  spec:
   containers:
   - name: sleep
     image: busybox
     command:
     - "/bin/sleep"
     - "30"
     restartPolicy: OnFailure

The restartPolicy can be set OnFailure or Never in which case it does not restart regardless of the exit code.
kubectl apply -f sleep-job.yml deploys the job.
kubectl get job displays the job.
kubectl get pods displays the pod that the job created.

A CronJob is a controller that creates Jobs on schedule.

apiVersion: batch/v1
kind: CronJob
metadata:
 name: sleep
spec:
 schedule: "0 3 * * *"
 jobTemplate:
 spec:
 template:
  spec:
   containers:
   - name: sleep
     image: busybox
     command:
      - "/bin/sleep"
      - "30"
     restartPolicy: OnFailure

kubectl apply -f sleep-cronjob.yml will create the cron job.
The cronjob now exists in the cluster, but it does not immediately create a Job or a Pod.
Instead, it will create a new Job each time its schedule is triggered.

StatefulSets

In cases where we may need to attach a pod to a specific persistent storage everytime is starts or restarts, a StatefulSet can be used.
A StatefulSet identifies each Pod with a number, starting at zero, and each Pod receives matching persistent storage.

When a Pod must be replaced, the new pod is assigned the same numeric identifier and is attached to the same storage.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sleep
spec:
  serviceName: sleep
 replicas: 2
 selector:
  matchLabels:
   app: sleep
template:
 metadata:
 labels:
  app: sleep
 spec:
  containers:
  - name: sleep
    image: busybox
    command:
     - "/bin/sleep"
     - "3600"
     volumeMounts:
     - name: sleep-volume
       mountPath: /storagedir
 volumeClaimTemplates:
 - metadata:
    name: sleep-volume
   spec:
    storageClassName: longhorn
    accessModes:
     - ReadWriteOnce
    resources:
     requests:
      storage: 10Mi

First we declare a serviceName to tie this StatefulSet to a Kubernetes Service.
This connection is used to create a Domain Name Service (DNS) entry for each Pod.
A template is provided for the StatefulSet to request persistent storage using volumeClaimTemplates.
We also tell Kubernetes where to mount that storage in our container using mountPath.
kubectl apply -f sleep-set.yaml deploys the StatefulSet.
kubectl get statefulsets displays the statefulsets.
kubectl get pods - displays two pods.
The persistent storage for each Pod is brand new, so it starts empty.
kubectl exec allows us to run commands inside a container.works no matter what host the container is on.

Daemon Sets

The DaemonSet controller is like a StatefulSet in that the DaemonSet also runs a specific number of Pods, each with a unique identity.
However, it runs exactly one pod per node, which is useful for control plane and add-on components for a cluster, such as a network or storage plug-in.