For years I’ve had a dream of building a rack mounted PC capable of splitting its resources to host multiple GPU intensive VMs:
- a few gaming VMs
- a VM for work that can run Davinci Resolve and Blender renders
- an LLM server
- a Stable Diffusion server
- media server
Just to name a few possibilities…
Everytime I’ve looked into it, it seemed like the technology just wasn’t there yet. I remember a few years ago Linus TT took a shot at it, but in the end suggested the technology (for non-commercial entities) just wasn’t in a comfortable spot yet.
So how far off are we? Obviously AI focused companies seem to make it work, but what possibilities exist for us self-hosters who might also want to run multiple displays in addition to the web gui LLM servers? And without forking out crazy money for GPU virtualization software licenses?
As others have expressed- were already there. Understand though that the reason this hasn’t caught on mainstream is the entire purpose of what you are asking is simple: it runs counter to the standards of commercial capitalism. We are talking about efficiency, self hosting, doing more with less, and cutting strings.
That said- understand that what you are undertaking is not dissimilar from building infrastructure in a company. You are building and expanding to meet your needs. Your needs are unique so there isn’t a ‘turn key’ solution that will fit perfectly… so you need to try things and see what works.
As far as things you are talking about specifically: you are going to ultimately be dipping your toes into the virtualization world… so xcp-ng and proxmox are good choices. If you can get your hands on older copies and uh… source a key or two: esxi is also very beginner friendly but won’t be able to upgrade thanks to their new pricing model. You seem like you are aware of the YouTube sphere so let me recommend 2GuysTech and the series on different hypervisors.
Once you decide on a hypervisor it’s as ‘simple’ as building a PC to meet your needs. If you have one already I’d start there to get a feel for how much you can pull out of it to determine how much you may need. You can probably split up a single GPU or just pass it through (cost vs performance.). LLMs are power / resource hungry so that may require it’s own GPU.
If power is cheap by you you can look into older server hardware but honestly this can be a messy space to dabble in (noise, heat, power costs.)
From there play with services that fit your needs.
It’s very doable and there are some easier paths to take… certainly- but again the thing about homelabs is it’s very custom. This is why the community (in general) is willing to help. We all have had to forge the same path.
100% ^^^ This.
You could do everything with openstack, and it would be a great learning experience, but expect to dedicate about 30% of your life to running and managing openstack. When it just works, it’s great… when it doesn’t… ohh boy, its like a CRPG which will unlock your hardware after you finish the adventure.
Can this solution deliver 3+ streams of high resolution (1440p or higher and 144fps) low latency video with no artifacting and near native performance and responsiveness?
Gaming has a high requirement for high fidelity and low latency I/O, no one wants to spend all this money on racks and thin clients, the then get laggy windows and scrolling, artifacts, video compression, and low resolution.
That’s the problem at hand with a gaming server, if you want to replace a gaming desktop with a vm in a rack, you need to actually get the I/O to the user somehow, either through dedicated cables from the rack, fiber, or networking, the first is impractical, it involves potentially 100ft long runs of multiple display port, HDMI, USB, etc, and is very rigid in its application, the second is very expensive, shooting the price up to thousands of dollars per seat for display port/USB over fiber extenders, and the third option I have yet to see a vnc/remote solution that can deliver near native video performance.
I should reiterate, the op wants to do fidelity sensitive tasks, like video editing, they don’t just need to work on a spreadsheet.
Yes, for some definition of ‘low latency’.
Geforce now, shadow.tech, luna, all demonstrate this is done at scale every day.
Do your own VM hosting in your own datacenter and you can knock off 10-30ms of latency.
However you define low latency there is a way to iteratively approach it with different costs. As technology marches on, more and more use cases are going to be ‘good enough’ for virtualization.
Quite frankly, if you have a all optical network being 1m away or 30km away doesn’t matter.
Just so we are clear, local isn’t always the clear winner, there are limits on how much power, cooling, noise, storage, and size that people find acceptable for their work environment. So there is some tradeoff function every application takes into account of all local vs distributed.
Right, but who has the resources to rent compute with multiple GPUs, this is a gaming setup, not office work, and the op was talking about racking it.
All of those services offer an inferior experience to being at the hardware, it’s just not the same experience. Seriously, try it with multiple 1440p 144hz displays, it just doesn’t happen work out well, you are getting a compromised product for a higher cost. You need a good GPU (or at least a way to decode multiple hvec streams) in in the client, and so, you can run a standard thin client.
‘low latency’ is a near native experience, I’m talking, you sit down at your desk and it feels like you are at your computer(as to say, multiple monitors, hdr, USB swapping, Bluetooth, audio, etc, all working seamlessly without noticeably diminished quality), anything less isn’t worth it, since you can just, use your computer like normal.
Remember the original poster here, was talking about running their own self-hosted GPU VM. So they’re not paying anybody else for the privilege of using their hardware
I personally stream with moonlight on my own network. Have no issues it’s just like being on the computer from my perspective.
If it doesn’t work for you Fair enough, but it can work for other people, and I think the original posters idea makes sense. They should absolutely run a GPU VM cluster, and have fun with it and it would be totally usable
Yea I do, you brought up that local isn’t always the option.
I desperately want it to work for me, i just can’t get it to work without spending thousands of dollars on hardware just to get back to the same experience as having a regular desktop at my desk.
Okay. Do you want to debug your situation?
What’s the operating system of the host? What’s the hardware in the host?
What’s the operating system in the client? What’s the hardware in the client?
What does the network look like between the two? Including every piece of cable, and switch?
Do you get sufficient experience if you’re just streaming a single monitor instead of multiple monitors?
This. Exactly. Many solutions exist but need to be selected based on scale and personal needs.
Here is an alternative Piped link(s):
2GuysTech
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source; check me out at GitHub.
None of the presented solutions cover the aspect of being in a different place than the rack, the same network is fine, but at a minimum a different room.
How do you deliver high resolution (e.g. 1440p, 144 fps) to multiple monitors with low latency over a network? I haven’t seen anything like that accomplished without running fiber from the host.
Eventually, your thin client will need too much power anyway, making the costs rise a lot. It makes sense in an office where you have 500 seats and you can load balance resources.
If someone can show me a multi seat gaming server that has native remote performance (as in you drag windows around in 144 fps, not the standard artifacty high latency behavior of vnc) I’ll eat a shoe.
Thin clients absolutely can do this already. There are a variety of ways to transmit low latency video around a home from HDBaseT solutions to multicast / network driven ones. Nevermind basic solutions like sunshine /moonlight… Nvidia variants etc.
I have a single racked PC for feeding my home which has 3 ‘desk’ endpoints and two tvs… all of which are fed from the same location and can be dynamically matrixed (albeit the choke point is usb2 to each location because I’m cheap.). Latency is maybe 1.5-3 frames from live. Other solutions are normally around 5-8 which while higher are sufficiently snappy and won’t effect competitive play (professional level notwithstanding.)
A lot of latency comes down to tuning your solution and research. The vnc method you refer to is the lowest common denominator running on ancient technology and codecs simply because it is a widely supported standard.
Edit: As far as 144 goes- I don’t have any displays that run that but I have two running at 120 with no issue.
What is the cost of the thin clients and are you doing this over copper?
Are your desks multi monitor? To get the bare minimum in my households scenario I would need at least 12 streams at greater than 1080p
For 5 seats how much did it cost versus just having a computer in each location? For example looking at hdbaset to replace just my desk setup, I would need 4 ~$350 devices, just looking at monoprice for an idea (https://www.monoprice.com/product?p_id=21669) which doesn’t even cover all of the screens in my office.
The two workstation nooks (spaces) have the capability to have a second monitor but I’ve since retired them in favor of ultrawide monitors which I find are a better experience in general. My current working solution is a split between two technologies: one thin client (second monitors) and one network distribution solution using multicast (primary displays and USB). Both run on copper 1 gig but the multicast traffic requires a switch that doesn’t suck and vlan usage. On average a single port can reach 70-85% usage sustained. I believe my longest run is 150’ ish.
Cost per node is roughly 300- so comparable to what you are experiencing. If I went stupid cheap I could probably cut that to maybe 150-250 depending on my luck with eBay and patience.
In terms of capabilities you could argue that this could be done without distribution using a nuc solution… but you’d have to split resources to reach node you’d need a full feature set at.
My central server is a threadripper build with 2 gpus for direct passthrough to ‘gaming’ vms and a split gpu handling the rest of the needs of the other systems. Thanks to the matrix capabilities any given seat can be any system… or in some cases 2 seats can be a single rig (2 room gaming off the same display). There is a cost savings to be found in splitting resources from a more expensive build out to cheaper nodes… but ymmv depending on active seats and specific needs. I believe as a general rule it should be less costly and more efficient (power/heat) than individual solutions.
Thanks for the breakdown! This is probably the most helpful breakdown I’ve seen of a build like this.
Absolutely 👍. I’ll just add that there are a lot of alternate routes to get the result you want so research and experiment but ideally set a deadline which can help with decision paralysis. Later changes are a problem for future you 😁.
Yep just ping time and latency make this a no go for a vast majority of us.
Can you define what acceptable latency would be?
local network ping (like corporate networks) 1-2ms
Encoding and decoding delay 10-15ms
So about ~20ms of latency
Real world example
Fiber isn’t some exotic never seen technology, its everywhere nowadays.
Moonlight literally does what you want, today! using hvec encoding straight in the gpu.
Try it out on your own network now.
A display port to fiber extender is $2,000. The fiber is not for the network.
Moonlight does not do what I want, moonlight requires a GPU on the thin client to decode. You would need a high end GPU to decide multiple high resolution video streams. Also afaik, moonlight doesn’t support multiple displays.
Fair enough. If you know it doesn’t work for your use case that’s fine.
As demonstrated elsewhere in this discussion, GPU HEVC encoding only requires 10ms of extra latency, then it can transit over fiber optic networking at very low latency.
Many GPUs have HEVC decoders on board., including cell phones. Most newer Intel and AMD CPUs actually have an HEVC decoder pipeline as well.
I don’t think anybody’s saying a self-hosted GPU VM is for everybody, but it does make sense for a lot of use cases. And that’s where I think our schism is coming from.
As far as the $2,000 transducer to fiber… it’s doing the same exact thing, just more specialized equipment maybe a little bit lower latency.