How to elegantly configure a deep learning environment from scratch on a server

Get the account password first

Intranet: ssh root@12.34.56.78 -p 2212
External network: ssh root@12.34.56.78 -p 32212
Password 654321

It's actually root... But the lab sister said that it was implemented with docker (I don't know the specifics), and everyone's accounts do not interfere with each other, so it doesn't matter

Very good, came up and found a blank

root@16ffc2fac0e4:~# ls
root@16ffc2fac0e4:~# python
-bash: python: command not found
root@16ffc2fac0e4:~# python3
Python 3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
root@16ffc2fac0e4:~# conda
-bash: conda: command not found
root@16ffc2fac0e4:~# git
-bash: git: command not found
root@16ffc2fac0e4:~#

Install anaconda first

superior https://www.anaconda.com/products/distribution#Downloads Check out the download options under Linux

There are four kinds:
64-Bit (x86) Installer (737 MB)
64-Bit (Power8 and Power9) Installer (360 MB)
64-Bit (AWS Graviton2 / ARM64) Installer (534 MB)
64-bit (Linux on IBM Z & LinuxONE) Installer (282 MB)

Commands to view system information

# Check how many bits the system has
root@16ffc2fac0e4:~# getconf LONG_BIT
64
# Linux kernel version
root@16ffc2fac0e4:~# cat /proc/version
Linux version 5.4.0-124-generic (buildd@lcy02-amd64-089) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022

(AMD and x86 are not two completely different things! I don't know where the illusion comes from) Let's choose the first one for now.

The download does not require hanging ladders.

Upload the sh script to the server

scp -P 2212 Downloads/Anaconda3-2022.10-Linux-x86_64.sh root@12.34.56.78:/root
# It's a bit troublesome to not support resuming from a breakpoint
# rsync take a look when I have time

Install Anaconda on the server

# run script
sh Anaconda3-2022.10-Linux-x86_64.sh

After installing and restarting the shell, there is a conda command. The (base) appeared earlier, and the python command is now available.

Configure password-free login and port mapping

Password-free login

  1. Generate public and private keys on the local machine

    1. Enter the .ssh directory under the user's home directory and cd ~/.ssh
    2. Execute ssh-keygen -t rsa, there will be 3 interactions after pressing Enter, the first is the file name, the default is id_rsa, if you need to modify it, you can enter a file name yourself. The second and third are passwords and confirmation passwords, which are the passwords to be entered when using the public key in the future. Generally, they are not set. If you have strong security requirements, you can set them yourself. Finally, two files id_rsa, id_rsa.pub will be generated. The one ending in .pub is the public key and the other is the private key.
    💡 If you have done it before configuring password-free login, don't do it again (otherwise, the original file will be overwritten, and the original password-free login will be invalid)
  2. Add the public key to the server's .ssh/authorized_keys file

    1. If the server does not have a .ssh folder and an authorized_keys file, just create a new one by yourself
  3. Configure the local config file

    Host server-in                # A name that helps you identify which machine this is
      HostName 12.34.56.78   # destination machine ip
      User root         # Username for ssh login
      Port 2212               # The port used by ssh, the default is 22
      IdentityFile /Users/apple/.ssh/id_rsa    # Local private key file path
    
    # The external network is also configured with a server-out, similar to
    

    Now you can log in with ssh server-in

    After configuring .ssh/config, the scp <local_file_path> server-in:<remote_file_path> command can be used at this time, that is, the user name, ip, and login port are omitted.

Port Mapping

It is convenient for you to use jupyter lab/jupyter notebook to access the resources on the server locally. Don't use the command line where it can be intuitive, right hhhh.

In the past, my method was ssh username@host -p port -L local_ip:local_port:remote_ip:remote_port, and then I would put this string of commands in bash and use alias to save a simple name

can now be configured with .ssh/config

Host server-in                                 # A name that helps you identify which machine this is
  HostName 12.34.56.78                    # destination machine ip
  User root                                  # Username for ssh login
  Port 2212                                  # The port used by ssh, the default is 22
  IdentityFile /Users/apple/.ssh/id_rsa            # Local private key file path
  LocalForward 127.0.0.1:4321 127.0.0.1:8894 # Port Mapping (Port Forwarding)

run jupyter lab/jupyter notebook

After I installed anaconda, I now have the jupyter command

jupyter lab --port 8894
# Did not run (that http://127.0.0.1/token=xxx did not appear)
# The error message let me add --allow-root

jupyter lab --allow-root --port 8894
# success

Local access http://127.0.0.1:4321/lab?token=xxxxxx, success

install tmux

tmux allows you to logout the server, the running code will not terminate.

There is no tmux yet:

(base) root@16ffc2fac0e4:~# tmux
-bash: tmux: command not found

I found two "online installation" methods:

# centos
sudo yum install tmux
# ubuntu
sudo apt-get install tmux

The result is sudo: command not found (the original sudo can not be found?)

After a long detour, check out https://blog.csdn.net/SH_ke/article/details/118496704 Find:

Straight to the point, Ubuntu's package management tool is apt-get, so you don't have to install yum anymore. If you want to install other packages you need to use the apt-get command.

Install tmux directly with apt-get

So directly apt-get install tmux, success.

Install deep learning common packages for conda environment

💡 If the channel/package is too large, the installation will be very slow. If you are afraid that the network will be disconnected in the middle, open tmux now and install it in the tmux session.

pytorch

old rules, look https://pytorch.org/ the install section.

The only problem is the cuda version.

View GPU configuration on server

  1. Is there a gpu? nvidia-smi command

    (base) root@16ffc2fac0e4:~/yum-3.2.28# nvidia-smi 
    Wed Oct 19 17:17:04 2022       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  On   | 00000000:1A:00.0 Off |                  N/A |
    |  0%   37C    P8    29W / 350W |      1MiB / 24576MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   1  NVIDIA GeForce ...  On   | 00000000:89:00.0 Off |                  N/A |
    |  0%   35C    P8    23W / 370W |      1MiB / 24576MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   2  NVIDIA GeForce ...  On   | 00000000:B1:00.0 Off |                  N/A |
    | 38%   34C    P8    19W / 350W |      1MiB / 24576MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   3  NVIDIA GeForce ...  On   | 00000000:B2:00.0 Off |                  N/A |
    | 38%   35C    P8    23W / 350W |      1MiB / 24576MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    
  2. cuda version?

    Still the above command, the first line of the table is written, CUDA Version: 11.6.

Get the install command

ok, now the pytorch official website gives the command we need to run:

NOTE: 'conda-forge' channel is required for cudatoolkit 11.6
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
(base) root@16ffc2fac0e4:~/yum-3.2.28# conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): /
# It's a bit slow, but if you wait a long time, you can collect it, and I don't want to use pip.
# cudatoolkit installation is especially slow. Maybe I didn't change
# Tired, Ctrl+C broke.  The two packages cudatoolkit and pytorch are too big

Set domestic source for conda

  • what channel s are there now

    # conda info
    channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                   https://repo.anaconda.com/pkgs/main/noarch
                   https://repo.anaconda.com/pkgs/r/linux-64
                   https://repo.anaconda.com/pkgs/r/noarch
    
  • Add Tsinghua source ( https://zhuanlan.zhihu.com/p/47663391)

    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
    # The above two are mirror images of the official Anaconda library
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
    # The above is the mirror image of Anaconda third-party library Conda Forge
    
    # for linux
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
    # for legacy win-64
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/peterjc123/
    # The above two are Pytorch's Anaconda third-party mirrors
    
    conda config --set show_channel_urls yes
    
  • re-install:

    conda install pytorch torchvision torchaudio cudatoolkit=11.6
    # Maybe I didn't set what source to actually use? Still very slow, but fortunately there is tmux
    

    CondaHTTPError: HTTP 000 CONNECTION FAILED for url [https://repo.anaconda.com/pkgs/main/linux-64/current_repodata.json](https://repo.anaconda.com/pkgs/main/linux-64 /current_repodata.json) Elapsed: -

  • Remove the default channel (-default)

    Solution reference https://zhuanlan.zhihu.com/p/260034241

    Successful installation.

Other common packages

  • tqdm (progress bar)
  • to be added

Tags: Linux Deep Learning server

Posted by daebat on Mon, 24 Oct 2022 16:52:51 +1030