I don't like these in most cases. Before yall yell at me, lemme explain.
-
Node-to-Node communication is a massively important problem. The easiest way to solve node-to-node communication is to have all the devices on the same silicon die. IE: Buy a 64-core EPYC. (Note: internally, AMD actually solved die-to-die communications through their infinity fabric and that's the key to their high-speed core-to-core communications despite having so many cores on one package).
-
Node-to-Node communication is a massively important problem. Once you maximize the size of a singular package, like 64-core EPYCs, the next step is to have chip-to-chip communications. Such as the Dual-socket (2x CPUs running on one motherboard). In practice, this is an extension to AMD's Infinity Fabric. Note that Intel has a Ultrapath Interconnect that works differently, but has similar specs (8-way cpu-to-cpu communications, NUMA awareness, etc. etc.)
-
Node-to-Node communication is a massively important problem. Once you've maximized the speed possible on a singular motherboard, your next step is to have a high-speed motherboard-to-motherboard connection. NVidia's NVLink is perhaps the best example of this, with GPU-to-GPU communications measured on the order of TBs/second.
-
Node-to-Node communication is a massively important problem. Once you've maximized NVidia's NVLink, you use NVidia's NVSwitch to expand communication to more GPUs.
-
Node-to-Node communication is a massively important problem. Once you've maximized a cluster with Dual-socket EPYCs and NVLink + NVSwitch GPUs, you then need to build out longer-scale communication networks. 10Gbit Ethernet can be used, but 400Gbit Infiniband is popular amongst nation-state supercomputers for a reason. I think I've read some papers that 100Gbit Fiber Optics or 40Gbit Fiber Optics is a good in-between and yields acceptable results (not as fast as Infiniband, but still much faster than your standard RJ-45 based consumer ethernet). 10Gbit Ethernet was used on some projects IIRC, so if you're trying to save money on the interconnect, its still doable.
So when I see "Someone builds a clustered-computer out of 1Mbit (aka: 0.000001 Tbit/sec) I2C communications, its hard for me to get excited, lol. The entire problem space is in practice, defined by the gross difficulty in computer-to-computer communications... and I2C is just not designed for this space. Surprisingly, modern supercomputers are "just servers", so anyone who has experience with the $20,000-class set of Xeons or EPYCs has experience with real supercomputer hardware today. (Which yall can rent for just a few dozen $ per day from Amazon Web services btw today, if you really wanted to. Cloud computing has made high-performance computing accessible in practice to even the cheapest hobbyist)
Now... when am I excited about "cheap clusters" ?? Well, the #1 problem with the approach I listed with 1-5 above is that such a beastly computer costs $10 Million or more. Even entire nation-states can struggle to find the budget for this, let alone smaller corporations or hobbyists. But the "skills" needed to program a $10-million supercomputer are still required, so we need to think about how to train the next generation of programmers to use these expensive supercomputers. (Unlike an AWS rented instance, the Rasp. Pi cluster has to be taken care of by the administrator, building real administration skills)
There was a project about Ethernet + Rasp. Pi that used MPI (Message Passing Interface) that handles this latter case. By using Rasp. Pis + standard ethernet switches as the basis of the cluster, it would only cost $thousands of dollars, not $millions to build a large cluster of hundreds of Rasp. Pis. MPI is one of the real APIs that are used on the big-boy nation-state level supercomputers as well. Rasp. Pi are not NUMA-aware nor do they have a good GPU-programming interface however, so its not a perfect emulation of the issues. But its good enough to teach students. A Rasp. Pi supercomputer never will be "useful" in the real world outside of training, but student-training is a good enough reason to be excited.
I look at this Rasp. Pi Pico Cluster and... its not clear to me how this teaches people of the "big boy" supercomputers, or how it'd be more useful than standard multi-core programming. I2C is not the language of high-performance-computers. And Rasp. Pi Pico cannot run Linux or other OSes that'd teach practical administration skills either.
For the embedded hobbyist, I'd suggest grabbing a Xilinx Zynq FPGA+ARM chip, and experimenting with the high-performance compute available to you from custom FPGAs. That's how the satellite and military gets large amounts of computational power into small power-envelopes, which is likely why you'd be "interested" in a Rasp. Pi pico in the first place. (Power-constraints due to small Satellites or Military weight restrictions prevent them from using real world supercomputers on RADARs or whatever). You can reach absurdly powerful levels of compute with incredibly low amounts of power-usage with this pattern. (And I presume anyone in the "Microcontrollers" discussion is here because we have an interest in power-constrained computing).
If Power is not a constraint to you... then you can study up on GPU-programming / Xeon-servers / EPYC-servers / etc. etc. for the big stuff. I am the moderator at https://lemmy.world/c/gpu_programming btw, so we can talk more over there if you're interested in the lower level GPU-programming details that'd build up to a real supercomputer. The absurd amounts of compute available for just $500 or so today cannot be overstated. An NVidia 4080 or an AMD 7900 XTX have more compute power than entire Cray Supercomputing Clusters of the early 2000s. Learning how to unlock this power is what GPU-programming (CUDA, DirectX, HIP/ROCm, OpenCL) is all about. I'm no expert in how to hook up these clusters together with MPI / Infiniband / etc. etc., but I can at least help you on this subject.