Let's get right into it.
Persistent Storage Nodes
Databases, file storage and other persistent systems should not be ephemeral by any means. These are hard storage and require a fixed location unless you can figure out how to move data at a super fast rate in order to avoid catastrophic failure. Replicate only mitigates this issue (is it?), it is not designed to be a magic bullet for some how guaranteeing some crazy 99.9999999999% availability.Should storage query layer clients be ephemeral? Depends on what the clients are used to do.
- For reading? Most likely not a problem as the initiating user can try again. This is bandwidth, throughput and latency, but it is easier to do a resume!
- For writing on the other hand, imagine transferring 100mb to be stored and the storage query layer client dies (for whatever reason), let's assume that the storage engine cleans up the mess left behind (with no fragmentation) then what do you communicate to the initiating client (human or machine)? That's right you can't! An initiating client will likely try again and get connected (in the backend via proxy) to another storage query client. Why is this fine with people? Availability? Distributed? "Data store suffered a failure, please try your upload of 100mb again". That is wasted bandwidth, throughput and latency!
A ephemeral node can go down at anything for no reason whatsoever. Is it worth the latency and decreased throughput to put storage write clients on an ephemeral node? I would argue against the worth given the randomness in death of nodes in the cluster. Write clients should be in a fixed location where the same loss/retry cycle can be tolerated with a load balancer.
I would like to know the architectures in distributed systems that account for death of nodes where a client is currently transferring data if there are any. Failure is usually handled with an error message and then retry of the idempotent operations. Idempotency is nice, but not for large files.
Unless you have a metric shit ton of nodes where you can replicate data enough and the in memory storage/file system is big enough to tolerate large node failures and random node failures that it would not matter if the nodes were cloud or fixed then sure cloud works, but I haven't seen anywhere that has large enough of a deployment. Ephemeral would represent unstable nodes and fixed would represent stable long-term nodes as that point.