Most popular from cloud to fog, cloud computing wi

2022-08-16
  • Detail

From "cloud" to "fog": cloud computing will die instead of distributed point-to-point network

when cloud computing is booming, viktorcharypar, the technical director of redbadge, a British digital consulting company, posted on VentureBeat that cloud services will usher in the end. It is pointed out that point-to-point network will be the development direction in the future. The article was compiled by 36 krypton

the cloud will come to an end. I know, this is a bold conclusion, which may sound a little crazy. But please bear with me and let me go on

there has always been a traditional view that applications running servers, whether web applications or mobile applications, will be in the cloud in the future. Amazon, Google and Microsoft have added a variety of tools to their cloud services, making the software services running in them more and more simple and convenient. Therefore, hosting code on AWS, GCP or azure is the best thing you can do - it's convenient, cheap, easy to automate, and you can flexibly control the scale...

so why should I predict that all this will end? There are several reasons:

first, it cannot meet the long-term expansion requirements

it is quite difficult to build a scalable, reliable and highly available web application, even in the cloud. If you do well and make your application a great success, then the huge scale will exhaust your money and energy. Even if your business is very successful, you will eventually reach the limit of cloud computing: the growth rate of computer computing speed and storage capacity exceeds the bandwidth of the network

ignoring the argument of network neutrality may not be a problem for most people (except Netflix and Amazon), but this will happen soon. With our video quality from HD to 4K to 8K, the amount of data we need is growing significantly, and VR datasets will appear soon

this is a problem mainly because of the way we organize networks. There are many users who want to get content and use programs, and only relatively few servers have these programs and content. For example, when I see a funny photo on slack, I want to share it with 20 people sitting next to me, but they must download it from the hosting server, which needs to send this photo 20 times

as servers move to the cloud, such as Amazon or Google's data center, Amazon or Google's computer, networks close to these places need incredible throughput to process all these data. In addition, there must be a large number of hard disks to store everyone and CPU data, and then transmit the data to everyone who wants it through the network. With the rise of streaming media services, the situation has become worse

all these activities require a lot of energy and cooling, which makes the whole system inefficient, expensive and detrimental to the environment

second, it is centralized and fragile

another problem of centralized storage of our data and programs is availability and persistence. What if Amazon's data center is hit by an asteroid or destroyed by a tornado? Or, what if it loses power for a period of time? The data stored on its machine cannot be temporarily accessed or even permanently lost

we usually alleviate this problem by storing data in multiple locations, but this only means more data centers. This may greatly reduce the risk of accidental losses, but what about the data you are very, very concerned about? Your wedding video, photos of your children growing up, or important public information sources, such as Wikipedia. All this information is now stored in the cloud - on sites such as Facebook, googledrive, icloud, or Dropbox. What happens to the data when these services stop operating or lose funds? Even if they don't develop to this point, they also restrict the way you access your own data. You must use their service. When you share with friends, they must also use this service

third, it needs trust, but it cannot provide protection

using cloud services, you need to convince your friends that the data they get is sent by you, and the data is passed through a trusted intermediary. In most cases, this can work well and is acceptable, but the stations and networks we use must be registered to operate legally, and the regulatory authorities have the power to force them to do many things. In most cases, this is a good thing, which can be used to help solve crimes or delete illegal content from the network, but in many cases, this power is abused

just a few weeks ago, the Spanish government did everything it could to prevent the independence referendum in Catalonia, including blocking information stations and telling people where to vote

fourth, it makes our data more vulnerable to attacks

the real fear of highly centralized Internet is the centralization of personal data. The big companies that provide services for us have a lot of data - these data contain enough information to predict what you will buy, who you will vote for, who you may buy a house, and even how many children you may have. This information is enough to apply for a credit card, a loan, or even buy a house in your name

and you may agree. After all, if you choose their services, you can only trust them. But this is not what you need to worry about. You need to worry about others. Earlier this year, Equifax, a credit reporting agency, lost data on 140million customers, one of the largest data breaches in history. These data are now public. We can regard this as a once-in-a-decade event. If we were more careful, this situation could have been avoided, but it is becoming increasingly obvious that data leakage like this is difficult to completely avoid. And once it appears, it is too dangerous and intolerable. The only way to really prevent such incidents from happening again is not to collect such large-scale data at the beginning

so, what will replace cloud

the interconnection mainly supported by client server protocols (such as HTTP) and the security based on trust in central institutions (such as TLS) are flawed, and will lead to some problems that are basically difficult to solve or cannot be solved. Now it's time to look for something better - a model framework in which no one else can completely store your personal data, large media files are spread all over the network, and the whole system is completely point-to-point and different servers with no or seven sets of oil distribution valves (I don't mean "no server" in the sense of cloud hosting, I mean really no server)

in this field, I have read a lot of literature and have been very convinced that point-to-point is our inevitable development direction in the future. Point to point network technology uses protocols and strategies to replace the building blocks of the network we know, and solves most of the problems I mentioned above. The goal is fully distributed, permanently redundant data storage, and each user participating in the network is storing copies of some of the available data

if you have heard of BitTorrent (bit stream), the following contents should sound familiar. On BitTorrent, network users can divide big data files into smaller blocks or segments (each block has a unique ID) without authorization from any central authority. To download a file, you only need a "magic" number, that is, a hash, that is, the fingerprint of the content. Then, your BitTorrent client will find those users who have file fragments according to the "content fingerprint" and download file fragments from them until you have all the fragments

an interesting point is how to match users. BitTorrent uses a protocol called kademlia. In kademlia, each peer on the network has a unique ID number with the same length as the unique block ID. It stores a block with a specific ID on a node whose ID is "closest" to that of the block. The random IDs of block and network peers should be fairly consistent in the distribution stored in the whole network. However, the block ID does not need to be randomly selected, but uses an encrypted hash - which is the only fingerprint of the content of the block itself, which is beneficial. Ensure that these blocks are addressable. This also makes it easier to verify the content of the block (by recalculating and comparing fingerprints) and ensures that users cannot download data other than the original data

another interesting feature is that by embedding the ID of one block into the content of another block, you can connect the two in a way that will not be tampered with. If the content of the link block changes, its ID will change and the link will be destroyed. If the embedded link is modified, the ID of the containing block will also change

this mechanism of embedding the ID of one block into another makes it possible to create such a blockchain (such as blockchains driven by bitcoin and other cryptocurrencies), and even more complex structures, which are usually called directed acyclicgraphs (DAG). (after ralphmerkle invented this link, it is usually called "Merkle link". So if you hear someone talking about merkeldags, you can probably know what they are talking about.) A common example of merkledag is the GIT repository. Git saves the submission history and all directories and files in a huge merkledag

this leads to another interesting feature of distributed storage based on content search under the same processing conditions: it is immutable. The content cannot be changed. Instead, new revisions are stored next to existing revisions. Blocks that have not changed between revisions are reused because, by definition, they have the same ID. This also means that the same files cannot be copied in such a storage system and transformed into efficient storage. So on this new network, every unique funny picture exists only once (although there are multiple copies in the whole group)

protocols such as kademlia, Merkle chain and merkledag provide us with tools to model file levels and revise timelines, and share them in a large P2P network. There are already some protocols that use these technologies to build distributed storage that meets our needs. What looks promising is IPFs

name and sharing problem

well, through these technologies, we can solve some of the problems I raised at the beginning: we get distributed and highly redundant storage on the devices connected to the network, which can record the history of files and keep all versions when necessary. This (almost) solves the problems of availability, capacity, persistence, and content validation. It also solves the problem of bandwidth - because they all use point-to-point data transmission, there will be no situation that the server can't bear

we need another one that can

Copyright © 2011 JIN SHI