Introduction
So, you’ve gained access to a powerful remote server with some beefy GPUs, huh? This is where the real, large-scale machine learning happens. But for many, leaving a familiar graphical interface for the command line and tools like SSH can feel daunting.
I decided to make this guide for all my juniors at uni and to anyone this deems to be useful. We’ll cover everything from securely connecting to the server to managing your files, Python environments, and long-running training jobs like a pro.
Part 1: The Key to the Kingdom - SSH
SSH, or Secure Shell, is your gateway to the remote machine. It’s a secure protocol that lets you open a command-line interface on another computer as if you were sitting right in front of it.
Your First Connection
The fundamental command is ssh
. To connect, you’ll need the server’s address (a domain name or IP address), your username, and possibly a specific port number.
# The general format is: ssh -X your_username@your_server_ip -p <port_number>
ssh -X user@remote.server.com -p 2222
Let’s break that down:
ssh
: The command itself.-X
: Enables X11 forwarding. This lets you run graphical applications on the server and have their windows appear on your local machine.user@remote.server.com
: Yourusername
at the server’saddress
.-p 2222
: Specifies theport
to connect to. The default SSH port is 22, but system administrators often change it for security, so make sure you use the one provided to you.
Going Passwordless with SSH Keys (Highly Recommended!)
Typing your password every time is tedious and less secure than using SSH keys. This method involves creating a cryptographic key pair: a private key that stays on your local machine (guard it with your life!) and a public key that you copy to the server.
- Generate Keys (on your local machine): If you haven’t already, run this command in your local terminal. When it asks for a file to save the key, just press Enter to accept the default. It’s good practice to add a passphrase for an extra layer of security.
ssh-keygen -t rsa -b 4096
- Copy Your Public Key to the Server: This command automatically appends your public key to the correct file (
~/.ssh/authorized_keys
) on the server.
# Adjust the port and user/host info as needed
ssh-copy-id -p 2222 user@remote.server.com
Now, try ssh
-ing into your server again. If you set a passphrase, you’ll enter that; otherwise, it should log you in without asking for your password!
Create SSH Shortcuts with a Config File
You can create a config file on your local machine to store connection details. This is a massive time-saver. Create the file ~/.ssh/config
if it doesn’t exist (touch ~/.ssh/config
) and add an entry like this:
Host my-server
HostName remote.server.com
User user
Port 2222
ForwardX11 yes
Now, you can connect to your server with a simple, memorable command:
ssh my-server
Part 2: Your Home on the Server - Essential Commands
Once you’re in, you need to know how to navigate and monitor the environment.
System and Resource Monitoring
- Change Your Password: If you’re using password authentication, this should be the first thing you do.
passwd
- Who’s Using the CPUs? (
htop
): A live, colorful, and user-friendly view of CPU and memory usage, and which processes are running. Far superior to the oldertop
command.
htop
- Who’s Using the GPUs? (
nvidia-smi
): The most important command for an ML practitioner. Usewatch
to run it repeatedly for a live view.
nvidia-smi
Understanding nvidia-smi
output: - Fan / Temp
: GPU temperature. Keep an eye on this; if it’s too high, the GPU might be throttling. - Pwr:Usage/Cap
: How much power the GPU is drawing out of its total capacity. - Memory-Usage
: How much VRAM is being used. This is critical! If you get a “CUDA out of memory” error, this is where you’ll see it. - GPU-Util
: Percentage of time the GPU cores were active. Aim for high utilization during training.
For GPU monitoring, I would also recommend installing gpustat
, which allows you to see usernames as well if you’re sharing the server with others.
Disk Space Management
- How much space am I using? (
du
): Usedu
(disk usage) to check file or directory sizes. The~
character is a shortcut for your home directory.
# Check the total size of your home directory in a human-readable format
du -sh ~
# A super useful command to find the top 10 largest files/folders in your current directory
du -h . | sort -rh | head -n 10
- How much space is left? (
df
): Usedf
(disk free) to see the total storage available on the disk partition.
# Show free space on the partition where your home directory is located
df -h ~
Basic File and Folder Operations
ls -lh
: List files in a long, human-readable format.cd [directory]
: Change directory. (cd ~
goes home,cd ..
goes up one level,cd -
goes to the previous directory).pwd
: Print working directory (shows you where you are).mkdir [directory_name]
: Make a new directory.cp -r [source] [destination]
: Copy a file or directory. Use-r
(recursive) for directories.mv [source] [destination]
: Move or rename a file or directory.rm [file]
: Remove a file.rm -r [directory]
: Remove a directory and all its contents. USE WITH EXTREME CAUTION! There is no Recycle Bin on the command line. This action is permanent.
Part 3: Running Long Experiments
If you run a script that takes hours and your SSH connection drops, your script will be terminated. Here are two ways to prevent that.
Method 1: The Quick & Dirty Way (nohup
& &
)
Use nohup
(no hang up) to make your script ignore the disconnect signal, and &
to run it in the background.
nohup python my_train_script.py --batch_size 32 > training.log 2>&1 &
nohup ... &
: Runs the command in the background, immune to hangups.> training.log
: Redirects the standard output (yourprint
statements) to a file namedtraining.log
.2>&1
: Redirects the standard error stream to the same place as the standard output. This means both your print statements and any error messages will be saved intraining.log
.- You can monitor the progress by “tailing” the log file:
tail -f training.log
Method 2: The Professional Way (tmux
)
A terminal multiplexer like tmux
is a far more powerful and flexible solution. It lets you create persistent sessions that you can detach from and re-attach to later.
- Start a new named session:
tmux new -s my_ml_session
- Run your script: Inside the new
tmux
window, just run your command normally.
python my_train_script.py --batch_size 32
# You can see the output directly here
Detach from the session: Press the key combination
Ctrl+b
, then release and pressd
(for detach). You’re now back in your normal shell, and thetmux
session is running in the background. You can safely log out.List running sessions:
tmux ls
Re-attach to the session: When you log back in later, just attach to your session to pick up right where you left off.
tmux attach -t my_ml_session
tmux
is a complete game-changer for remote work. It’s worth learning a few more of its commands!
Part 4: Python Environments with Conda
Never use the system’s default Python! It will lead to dependency conflicts. Always create isolated environments for each of your projects using conda
.
You can install conda
by following the instructions on the Installing Anaconda Distribution page.
- Create an environment:
conda create --name my-project-env python=3.12 git
- Activate the environment: You must do this every time you start a new session.
conda activate my-project-env
- Install packages:
conda install numpy pandas matplotlib
- Install PyTorch with CUDA: This is critical. Go to the official PyTorch website and use their command generator to get the correct command for your server’s CUDA version.
# Example for CUDA 12.1 - ALWAYS CHECK THE WEBSITE FOR THE LATEST COMMAND!
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
- Deactivate:
conda deactivate
- List and Delete Environments:
conda env list
conda remove --name my-project-env --all
- Best Practice: Use Environment Files: To make your research reproducible, save your environment’s dependencies to a file.
# Export your current environment to a file
conda env export > environment.yml
# Re-create the exact same environment on another machine (or for a friend)
conda env create -f environment.yml
- Using Your Environment in Jupyter: To make your Conda environment available as a kernel in Jupyter, you need
ipykernel
.
# Run this inside your activated environment
conda install ipykernel
Part 5: Transferring Files
You’ll constantly need to move datasets, code, and results between your local machine and the server.
scp
(Secure Copy)
scp
works like cp
but over the network. It’s simple for single files or small directories. We use the shortcut my-server
that we defined in the SSH config file.
# Copy a local file TO the remote server
# Note: scp uses a capital -P for port, but we don't need it when using a config file.
scp /path/to/local/file.txt my-server:~/remote/destination/
# Copy a remote folder TO your local machine (using the -r flag for recursive)
scp -r my-server:~/path/on/server/results/ ./local/destination/
rsync
(The Superior Choice)
rsync
is faster and more powerful. It only transfers the differences between files, which is incredibly efficient for syncing project directories where you’ve only changed a few code files.
# Sync a local project folder TO the remote server using the config file alias
# -a: archive mode (preserves permissions, etc.)
# -v: verbose (shows which files are being transferred)
# -z: compresses data during transfer
# --progress: shows a progress bar for large files
rsync -avz --progress /path/to/local/project/ my-server:~/path/to/remote/project/
# To make the remote directory an exact mirror (deleting files on the server that aren't on local), add --delete
rsync -avz --progress --delete /path/to/local/project/ my-server:~/path/to/remote/project/
Part 6: The Best of Both Worlds - VS Code Remote SSH
The VS Code “Remote - SSH” extension allows you to use your slick VS Code interface on your local machine with all the extensions you’ve set up, while all the code editing, execution, and terminal commands happen on the powerful remote server.
Step 1: Prerequisites
- Install Visual Studio Code on your local machine.
- In VS Code, go to the Extensions view (click the icon on the left sidebar).
- Search for and install the “Remote - SSH” extension by Microsoft.
Step 2: Connecting to Your Server
The “Remote - SSH” extension will automatically read your ~/.ssh/config
file. This is the easiest way to connect.
- Open the Command Palette:
Ctrl+Shift+P
(orCmd+Shift+P
on Mac). - Type
Remote-SSH: Connect to Host...
and press Enter. - You should see the alias you created earlier (
my-server
) in the list. Select it. If you don’t see it, you can manually enter the SSH command in the formatssh my-server
or the full command discussed earlier. - If prompted, enter your SSH key passphrase or password for the remote server.
A new VS Code window will open. It will take a moment to connect and install a lightweight “VS Code Server” on the remote machine. Once connected, look at the green button in the bottom-left corner. It should say SSH: my-server
, confirming you are connected!
Step 3: Your Remote Workspace
You are now controlling the remote server from within VS Code.
- Open a Folder: Go to
File > Open Folder...
. The file dialog that opens shows the filesystem of your remote server. Navigate to your project directory (e.g.,~/path/to/remote/project/
) and open it. - Integrated Terminal: Open a new terminal with
Ctrl+
` (backtick) orTerminal > New Terminal
. This is a terminal on your remote server. You can rungpustat
,htop
,conda activate my-project-env
, and any other command right here. - Editing Files: Simply click on files in the Explorer sidebar to open and edit them. The changes are saved directly on the remote server. No more
rsync
-ing your code after every small change!
Step 4: Setting Up Your Python Environment
This is the most critical step. You need to tell VS Code which Python interpreter (your Conda environment) to use for code completion, linting, and execution.
- Open the Command Palette (
Ctrl+Shift+P
/Cmd+Shift+P
). - Type
Python: Select Interpreter
. - A list of Python interpreters found on your remote server will appear. Find and select the one corresponding to your Conda environment (e.g.,
~/conda/envs/my-project-env/bin/python
).
Now, VS Code is fully aware of your project’s dependencies. You’ll get smart auto-completion and error checking based on the packages you installed in that environment.
Step 5: The Modern ML Workflow in Action
With everything set up, your workflow becomes incredibly smooth:
- Jupyter Notebooks: Open any
.ipynb
file on your remote server. VS Code’s built-in Jupyter support will activate. You can run cells, and the computation will be executed by the Python kernel from your selected Conda environment on the powerful remote machine. - Git Integration: Although I would recommend using the command line, you can also use VS Code’s source control panel to stage, commit, push, and pull changes with Git, with all commands running directly on the remote server.
Conclusion
This guide has taken you from the fundamentals of SSH and the command line to a professional, hybrid workflow using VS Code.
Happy training!
PS: I keep discovering new things about linux and remote servers lol. Please share if you have any tips or tricks that I missed!