Learning notes of "do it yourself Docker" 2

Learning notes of "do it yourself Docker" 2

1 Preface

As my graduation project is related to the field of cloud native, I have been learning about Docker recently. Writing Docker by myself covers all kinds of knowledge at the bottom of Docker and provides all kinds of experiments and code demos. It is a good book for getting started with cloud native. This paper mainly records my experimental process and some summary feelings.

The experimental environment involved in this paper:

Vmware Workstation 16 builds Ubuntu 20 04 environment

The Linux kernel is 5.10 x

Go version 1.17.1

Chapter II Basic Technology

2.1 Linux Namespace

Linux Namespace is a function of Kernel, which can isolate a series of system resources, such as PID (process ID), User ID, Network, etc.

Currently, Linux implements six different types of namespaces.

The Namespace API mainly uses the following system calls

  • clone() creates a new process. Determine which types of namespaces are created according to the system call parameters, and their child processes will also be included in these namespaces.

  • unshare() moves the process out of a Namespace.

  • setns() adds the process to the Namespace.

UTS NameSpace

#Create main in the / src directory Go file for experiment
vim main.go

#write in
package main

import (
    "os/exec"
    "syscall"
    "os"
    "log"
)

func main() {
    cmd := exec.Command("sh")//Specifies the initialization process in the new process forked ()
    cmd.SysProcAttr = &syscall.SysProcAttr{
        //The clone has been encapsulated. Just call it directly
        Cloneflags: syscall.CLONE_NEWUTS,
    }
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    if err := cmd.Run(); err != nil {
        log.Fatal(err)
    }
}

#Exit and save
#Enter go run main Go into namespace space
go run main.go

IPC NameSpace

IPC Namespace is used to isolate System V IPC and POSIX message queue s Each IPC queue and POS system has its own IPC Namespace.

package main

import (
    "log"
    "os"
    "os/exec"
    "syscall"
)
func main()  {
    cmd := exec.Command("sh")
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS|
            syscall.CLONE_NEWIPC, #Add IPC namespace here
    }
    cmd.Stdin=os.Stdin
    cmd.Stdout=os.Stdout
    cmd.Stderr=os.Stderr

    if err :=cmd.Run(); err!=nil{
        log.Fatal(err)
    }
}

PID NameSpacce

PID Namespace is used to isolate the process id. The same process can have different PIDs in different PID namespaces.

It can be understood that in the docker container, we use ps -ef to find that the PID of the init process running in the foreground inside the container is 1, but outside the container, we use ps -ef to find that the same process has different PIDs, which is what PID namespace does.

Before modification

modify

package main

import (
    "log"
    "os"
    "os/exec"
    "syscall"
)

func main()  {
    cmd := exec.Command("sh")
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags : syscall.CLONE_NEWUTS|syscall.CLONE_NEWIPC|
            syscall.CLONE_NEWPID, #Add PID namespace here
    }
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    if err := cmd.Run();err!=nil{
        log.Fatal(err)
    }
}

After modification, enter the namespace

Mount NameSpace

In the previous Domo, you can't use top and ps to view temporarily because it will use / proc content

Linux On system/proc A directory is a file system, i.e proc File system. Unlike other common file systems,/proc It is a pseudo file system (i.e. virtual file system), which stores a series of special files of the current kernel running state. Users can view the information about the system hardware and the currently running process through these files, and even change the running state of the kernel by changing some of them.
be based on/proc Due to the particularity of the file system as mentioned above, the files in it are often called virtual files

Modify the code and add MNT namespace

package main

import (
    "log"
    "os"
    "os/exec"
    "syscall"
)

func main()  {
    cmd := exec.Command("sh")
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS|syscall.CLONE_NEWIPC|
            syscall.CLONE_NEWPID|syscall.CLONE_NEWNS,#Add namespace mnt here
    }
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    if err := cmd.Run();err!=nil{
        log.Fatal(err)
    }
}

Before unmounting / proc

Mount / proc

mount -t proc proc/proc
ps -ef

User NameSpace

User namespace is mainly the user Group ID of isolated users. In other words, the User ID and Group ID of a process can be different inside and outside the user namespace.
It is commonly used to create a User namespace by running as a non root user on the host computer, and then map it to root user in the User namespace.
This process has root permission in the User namespace, but it does not have root permission outside the User namespace.

Note here: CentOS turns off User namespace by default, and the kernel version is too high, which will make the code unable to run. Please confirm the kernel version and then modify it.

package main

import (
    "log"
    "os"
    "os/exec"
    "syscall"
)

func main() {
    cmd := exec.Command("sh")
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS |
            syscall.CLONE_NEWUSER, #Add User namespace here
    }
    cmd.SysProcAttr.Credential = &syscall.Credential{Uid: uint32(1), Gid: uint32(1)}
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    if err := cmd.Run(); err != nil {
        log.Fatal(err)
    }
    os.Exit(-1)
}


NetWork NameSpace

Network namespace is a namespace used to isolate network devices, IP addresses, ports and other network stacks. Network namespace allows each container to have its own independent network device (virtual), and the applications in the container can be bound to its own ports. The ports in each namespace will not conflict with each other. After the network bridge is built on the host, the communication between containers can be easily realized, and the applications in each container can use the same port.

package main

import (
    "log"
    "os"
    "os/exec"
    "syscall"
)

func main() {
    cmd := exec.Command("sh")
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS |
            syscall.CLONE_NEWUSER | syscall.CLONE_NEWNET,
    }
    cmd.SysProcAttr.Credential = &syscall.Credential{Uid: uint32(1), Gid: uint32(1)}
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    if err := cmd.Run(); err != nil {
        log.Fatal(err)
    }
    os.Exit(-1)
}

Host network

Intranamespace network

You can see that there are lo, eth0, eth1 and other network devices on the host, but there are no network devices in the Namespace. This shows the network isolation between the Network namespace and the host computer.

2.2 Cgroups

Linux Cgroups (Control Groups) provides the ability to limit, control and count the resources of a group of processes and future sub processes, including CPU, memory, storage, network, etc.

Three components in Cgroups

  • cgroup: a group of processes is often managed and associated with a group of subsystem s.
  • subsystem: a group of resource control modules, mainly including the restrictions on memory and CPU
  • hierarchy: string a group of cgroups into a tree structure, through which cgroups can inherit.

Using go language to limit container resources through cgroup

package main

import (
    "os/exec"
    "path"
    "os"
    "fmt"
    "io/ioutil"
    "syscall"
    "strconv"
)

const cgroupMemoryHierarchyMount = "/sys/fs/cgroup/memory"

func main() {
    if os.Args[0] == "/proc/self/exe" {
        //Container process
        fmt.Printf("current pid %d", syscall.Getpid())
        fmt.Println()
        cmd := exec.Command("sh", "-c", `stress --vm-bytes 200m --vm-keep -m 1`)
        cmd.SysProcAttr = &syscall.SysProcAttr{
        }
        cmd.Stdin = os.Stdin
        cmd.Stdout = os.Stdout
        cmd.Stderr = os.Stderr

        if err := cmd.Run(); err != nil {
            fmt.Println(err)
            os.Exit(1)
        }
    }

    cmd := exec.Command("/proc/self/exe")
    cmd.SysProcAttr = &syscall.SysProcAttr{
        Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS,
    }
    cmd.Stdin = os.Stdin
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr

    if err := cmd.Start(); err != nil {
        fmt.Println("ERROR", err)
        os.Exit(1)
    } else {
        //Get the pid of the process mapped in the external namespace from the fork
        fmt.Printf("%v", cmd.Process.Pid)

        // Create a cgroup on the Hierarchy with the memory subsystem attached by default
        os.Mkdir(path.Join(cgroupMemoryHierarchyMount, "testmemorylimit"), 0755)
        // Add the container process to this cgroup
        ioutil.WriteFile(path.Join(cgroupMemoryHierarchyMount, "testmemorylimit", "tasks") , []byte(strconv.Itoa(cmd.Process.Pid)), 0644)
        // Restrict cgroup process usage
        ioutil.WriteFile(path.Join(cgroupMemoryHierarchyMount, "testmemorylimit", "memory.limit_in_bytes") , []byte("100m"), 0644)
    }
    cmd.Process.Wait()
}

By configuring the Cgroups virtual file system, we limit the memory occupation of the stress process in the container to 100m.

2.3 Union File System

It mainly involves the following key contents:

  1. Copy on write (hereinafter referred to as CoW), also known as implicit sharing, is a resource management technology to achieve efficient replication of modifiable resources. Generally speaking, each write is to operate on the copy of the original file, and the original file will not be modified, and the subsequent modifications are to operate on this copy, that is, it is copied only when it is modified for the first time.
  2. The layered file system mirrored by Docker uses AUFS mechanism. AUFS has the advantages of fast starting container and efficient utilization of storage and memory. Each image has a tree structure, and the upper image uses the lower image.
  3. Docker uses Cow technology to implement image layer and reduce disk space occupation. Cow means that once only a small part of a file has been changed, the whole file also needs to be copied. However, don't worry too much. For each container, each image layer only needs to be copied once at most. Subsequent changes will be made on the container layer of the first copy.

Tags: Go Linux Docker Container Cloud Native

Posted by orangehairedboy on Sun, 17 Apr 2022 17:49:35 +0930