Remote SSH Servers – Devansh’s Blog

Introduction

So, you’ve gained access to a powerful remote server with some beefy GPUs, huh? This is where the real, large-scale machine learning happens. But for many, leaving a familiar graphical interface for the command line and tools like SSH can feel daunting.

I decided to make this guide for all my juniors at uni and to anyone this deems to be useful. We’ll cover everything from securely connecting to the server to managing your files, Python environments, and long-running training jobs like a pro.

Part 1: The Key to the Kingdom - SSH

SSH, or Secure Shell, is your gateway to the remote machine. It’s a secure protocol that lets you open a command-line interface on another computer as if you were sitting right in front of it.

Your First Connection

The fundamental command is ssh. To connect, you’ll need the server’s address (a domain name or IP address), your username, and possibly a specific port number.

# The general format is: ssh -X your_username@your_server_ip -p <port_number>
ssh -X user@remote.server.com -p 2222

Let’s break that down:

ssh: The command itself.
-X: Enables X11 forwarding. This lets you run graphical applications on the server and have their windows appear on your local machine.
user@remote.server.com: Your username at the server’s address.
-p 2222: Specifies the port to connect to. The default SSH port is 22, but system administrators often change it for security, so make sure you use the one provided to you.

Going Passwordless with SSH Keys (Highly Recommended!)

Typing your password every time is tedious and less secure than using SSH keys. This method involves creating a cryptographic key pair: a private key that stays on your local machine (guard it with your life!) and a public key that you copy to the server.

Generate Keys (on your local machine): If you haven’t already, run this command in your local terminal. When it asks for a file to save the key, just press Enter to accept the default. It’s good practice to add a passphrase for an extra layer of security.

ssh-keygen -t rsa -b 4096

Copy Your Public Key to the Server: This command automatically appends your public key to the correct file (~/.ssh/authorized_keys) on the server.

# Adjust the port and user/host info as needed
ssh-copy-id -p 2222 user@remote.server.com

Now, try ssh-ing into your server again. If you set a passphrase, you’ll enter that; otherwise, it should log you in without asking for your password!

Create SSH Shortcuts with a Config File

You can create a config file on your local machine to store connection details. This is a massive time-saver. Create the file ~/.ssh/config if it doesn’t exist (touch ~/.ssh/config) and add an entry like this:

    Host my-server
        HostName remote.server.com
        User user
        Port 2222
        ForwardX11 yes

Now, you can connect to your server with a simple, memorable command:

ssh my-server

Part 2: Your Home on the Server - Essential Commands

Once you’re in, you need to know how to navigate and monitor the environment.

System and Resource Monitoring

Change Your Password: If you’re using password authentication, this should be the first thing you do.

passwd

Who’s Using the CPUs? (htop): A live, colorful, and user-friendly view of CPU and memory usage, and which processes are running. Far superior to the older top command.

htop

Who’s Using the GPUs? (nvidia-smi): The most important command for an ML practitioner. Use watch to run it repeatedly for a live view.

nvidia-smi

Understanding nvidia-smi output: - Fan / Temp: GPU temperature. Keep an eye on this; if it’s too high, the GPU might be throttling. - Pwr:Usage/Cap: How much power the GPU is drawing out of its total capacity. - Memory-Usage: How much VRAM is being used. This is critical! If you get a “CUDA out of memory” error, this is where you’ll see it. - GPU-Util: Percentage of time the GPU cores were active. Aim for high utilization during training.

For GPU monitoring, I would also recommend installing gpustat, which allows you to see usernames as well if you’re sharing the server with others.

Disk Space Management

How much space am I using? (du): Use du (disk usage) to check file or directory sizes. The ~ character is a shortcut for your home directory.

# Check the total size of your home directory in a human-readable format
du -sh ~

# A super useful command to find the top 10 largest files/folders in your current directory
du -h . | sort -rh | head -n 10

How much space is left? (df): Use df (disk free) to see the total storage available on the disk partition.

# Show free space on the partition where your home directory is located
df -h ~

Basic File and Folder Operations

ls -lh: List files in a long, human-readable format.
cd [directory]: Change directory. (cd ~ goes home, cd .. goes up one level, cd - goes to the previous directory).
pwd: Print working directory (shows you where you are).
mkdir [directory_name]: Make a new directory.
cp -r [source] [destination]: Copy a file or directory. Use -r (recursive) for directories.
mv [source] [destination]: Move or rename a file or directory.
rm [file]: Remove a file.
rm -r [directory]: Remove a directory and all its contents. USE WITH EXTREME CAUTION! There is no Recycle Bin on the command line. This action is permanent.

Part 3: Running Long Experiments

If you run a script that takes hours and your SSH connection drops, your script will be terminated. Here are two ways to prevent that.

Method 1: The Quick & Dirty Way (`nohup` & `&`)

Use nohup (no hang up) to make your script ignore the disconnect signal, and & to run it in the background.

nohup python my_train_script.py --batch_size 32 > training.log 2>&1 &

nohup ... &: Runs the command in the background, immune to hangups.
> training.log: Redirects the standard output (your print statements) to a file named training.log.
2>&1: Redirects the standard error stream to the same place as the standard output. This means both your print statements and any error messages will be saved in training.log.
You can monitor the progress by “tailing” the log file:

tail -f training.log

Method 2: The Professional Way (`tmux`)

A terminal multiplexer like tmux is a far more powerful and flexible solution. It lets you create persistent sessions that you can detach from and re-attach to later.

Start a new named session:

tmux new -s my_ml_session

Run your script: Inside the new tmux window, just run your command normally.

python my_train_script.py --batch_size 32
# You can see the output directly here

Detach from the session: Press the key combination Ctrl+b, then release and press d (for detach). You’re now back in your normal shell, and the tmux session is running in the background. You can safely log out.
List running sessions:
```
tmux ls
```
Re-attach to the session: When you log back in later, just attach to your session to pick up right where you left off.
```
tmux attach -t my_ml_session
```

tmux is a complete game-changer for remote work. It’s worth learning a few more of its commands!

Part 4: Python Environments with Conda

Never use the system’s default Python! It will lead to dependency conflicts. Always create isolated environments for each of your projects using conda.

You can install conda by following the instructions on the Installing Anaconda Distribution page.

Create an environment:

conda create --name my-project-env python=3.12 git

Activate the environment: You must do this every time you start a new session.

conda activate my-project-env

Install packages:

conda install numpy pandas matplotlib

Install PyTorch with CUDA: This is critical. Go to the official PyTorch website and use their command generator to get the correct command for your server’s CUDA version.

# Example for CUDA 12.1 - ALWAYS CHECK THE WEBSITE FOR THE LATEST COMMAND!
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

Deactivate:

conda deactivate

List and Delete Environments:

conda env list
conda remove --name my-project-env --all

Best Practice: Use Environment Files: To make your research reproducible, save your environment’s dependencies to a file.

# Export your current environment to a file
conda env export > environment.yml

# Re-create the exact same environment on another machine (or for a friend)
conda env create -f environment.yml

Using Your Environment in Jupyter: To make your Conda environment available as a kernel in Jupyter, you need ipykernel.

# Run this inside your activated environment
conda install ipykernel

Part 5: Transferring Files

You’ll constantly need to move datasets, code, and results between your local machine and the server.

`scp` (Secure Copy)

scp works like cp but over the network. It’s simple for single files or small directories. We use the shortcut my-server that we defined in the SSH config file.

# Copy a local file TO the remote server
# Note: scp uses a capital -P for port, but we don't need it when using a config file.
scp /path/to/local/file.txt my-server:~/remote/destination/

# Copy a remote folder TO your local machine (using the -r flag for recursive)
scp -r my-server:~/path/on/server/results/ ./local/destination/

`rsync` (The Superior Choice)

rsync is faster and more powerful. It only transfers the differences between files, which is incredibly efficient for syncing project directories where you’ve only changed a few code files.

# Sync a local project folder TO the remote server using the config file alias
# -a: archive mode (preserves permissions, etc.)
# -v: verbose (shows which files are being transferred)
# -z: compresses data during transfer
# --progress: shows a progress bar for large files
rsync -avz --progress /path/to/local/project/ my-server:~/path/to/remote/project/

# To make the remote directory an exact mirror (deleting files on the server that aren't on local), add --delete
rsync -avz --progress --delete /path/to/local/project/ my-server:~/path/to/remote/project/

Part 6: The Best of Both Worlds - VS Code Remote SSH

The VS Code “Remote - SSH” extension allows you to use your slick VS Code interface on your local machine with all the extensions you’ve set up, while all the code editing, execution, and terminal commands happen on the powerful remote server.

Step 1: Prerequisites

Install Visual Studio Code on your local machine.
In VS Code, go to the Extensions view (click the icon on the left sidebar).
Search for and install the “Remote - SSH” extension by Microsoft.

Step 2: Connecting to Your Server

The “Remote - SSH” extension will automatically read your ~/.ssh/config file. This is the easiest way to connect.

Open the Command Palette: Ctrl+Shift+P (or Cmd+Shift+P on Mac).
Type Remote-SSH: Connect to Host... and press Enter.
You should see the alias you created earlier (my-server) in the list. Select it. If you don’t see it, you can manually enter the SSH command in the format ssh my-server or the full command discussed earlier.
If prompted, enter your SSH key passphrase or password for the remote server.

A new VS Code window will open. It will take a moment to connect and install a lightweight “VS Code Server” on the remote machine. Once connected, look at the green button in the bottom-left corner. It should say SSH: my-server, confirming you are connected!

Step 3: Your Remote Workspace

You are now controlling the remote server from within VS Code.

Open a Folder: Go to File > Open Folder.... The file dialog that opens shows the filesystem of your remote server. Navigate to your project directory (e.g., ~/path/to/remote/project/) and open it.
Integrated Terminal: Open a new terminal with Ctrl+ ` (backtick) or Terminal > New Terminal. This is a terminal on your remote server. You can run gpustat, htop, conda activate my-project-env, and any other command right here.
Editing Files: Simply click on files in the Explorer sidebar to open and edit them. The changes are saved directly on the remote server. No more rsync-ing your code after every small change!

Step 4: Setting Up Your Python Environment

This is the most critical step. You need to tell VS Code which Python interpreter (your Conda environment) to use for code completion, linting, and execution.

Open the Command Palette (Ctrl+Shift+P / Cmd+Shift+P).
Type Python: Select Interpreter.
A list of Python interpreters found on your remote server will appear. Find and select the one corresponding to your Conda environment (e.g., ~/conda/envs/my-project-env/bin/python).

Now, VS Code is fully aware of your project’s dependencies. You’ll get smart auto-completion and error checking based on the packages you installed in that environment.

Step 5: The Modern ML Workflow in Action

With everything set up, your workflow becomes incredibly smooth:

Jupyter Notebooks: Open any .ipynb file on your remote server. VS Code’s built-in Jupyter support will activate. You can run cells, and the computation will be executed by the Python kernel from your selected Conda environment on the powerful remote machine.
Git Integration: Although I would recommend using the command line, you can also use VS Code’s source control panel to stage, commit, push, and pull changes with Git, with all commands running directly on the remote server.

Conclusion

This guide has taken you from the fundamentals of SSH and the command line to a professional, hybrid workflow using VS Code.

Happy training!

PS: I keep discovering new things about linux and remote servers lol. Please share if you have any tips or tricks that I missed!