Ask HN: What OS/distro is used at an AI datacenter

2 points by kranke155 6 hours ago

I am making a film where we see a "desktop" screen of someone working on an AI data centre. They are deploying a new AI model.

I'm a tad confused about this - because I don't know what OS / Linux distro is used at a AI data centre level.

I'd like this to be authentic, but I can't find a lot of information about what is used.

Main question:

I am assuming at the data centre level, one would use Linux? Is that correct?

More questions:

1. So - what about CUDA on Linux? Does CUDA run really well on linux, or do they somehow run the models autonomously from the OS layer when they are running these huge AI models?

2. So if they're deploying a new model, do they use the linux terminal or something else, some sort of nvidia terminal layer, or even custom tools?

3. Is there a preferred distro for an AI data centre, or is this the case that each AI lab makes their own distro or chooses their own?

4. Also, should I assume is this is CLI only, or would they have a distro that has a GUI?

I couldn't easily find answers to these questions, I hope it's ok to ask these on HN here.

This would be a video/multimodal model. The company is a fictional version of Youtube.

gogurt2000 2 hours ago

0. You're spot on: data centers almost exclusively run Linux.

1. CUDA runs well on Linux, but it's not real clear to me what you're asking in the second half of this question. CUDA is an SDK and driver that nvidia produces so that developers can write and run software on their specialized hardware. You wouldn't really run anything on nvidia's hardware without it. You also wouldn't really run anything independent of the OS.

2. Yes, when deploying a new model in a datacenter setting you'd use a linux terminal. Any custom tools used in a datacenter will be command line tools.

3. Data centers typically run CentOS, RHEL, debian, or ubuntu. CentOS if you're a large tech company. RHEL if you're not (ie: you're a car company looking to offload some liability on red hat when there are technical problems). Debian or ubuntu if you're a tech startup. Big cloud providers typically provide their own distro like Amazon Linux or Oracle Linux (both of which are based on RHEL) for ease of use and support, but you can run whatever distro you'd like. If you're running something in a container you might use a more lightweight, security focused distro like Alpine.

For an AI datacenter, I'd expect to see RHEL, debian, or ubuntu because nvidia officially supports them. You'd need a very good reason to put resources into customizing a distro or using an oddball distro.

4. Working on anything in a datacenter is done through a terminal, but that doesn't mean the computer you're using doesn't have a GUI. People sit at Linux, Windows, and Mac desktops with terminals connected to remote machines in the datacenter. How they interact with the datacenter is CLI only, but they've got a gui because they have other stuff open like a web browser, slack, or an IDE.

Hope that helps!