Cloud R&D architecture pattern

By chance, I saw Tencent’s new Nocalhost tool — a cloud native cloud development environment tool. Personally, I was very excited to see the tool for the first time. Finally, my cloud R&D theory system (” Cloud R&D: R&D is Code “) ushered in a historical moment.

In short, Nocalhost does one simple thing, putting the local development environment in the cloud. However, combined with the recent craze for no-code/low-code development, or all kinds of code generation, and the Datum language (formerly Charj) that my colleagues and I are working on, I’ve come to see a new, futuristic, next-generation architectural pattern. Although the theoretical system is not perfect, I temporarily call it Water. The reason is that shapes won’t matter, and the code won’t need them anymore.

Water coding architecture, that is, the output of human programming is no longer the type of string of programming language, but only the presentation of this architecture mode on UI during programming, and its final storage form is a series of intermediate forms of codes, such as AST, HIR, MIR, etc. Instead of being tied to the operating system and file system, code can be controlled by smaller architectural factors, namely functions. Code no longer needs to be stored as files, and the architecture of the system is no longer a hierarchical structure divided by directories.

Too long do not read version

Real-time development environment: the development environment consists of an editor /IDE + browser rendered by the AST.

  • Language-independent editors:
    • Rendering. Receive a generic AST to reproduce the code as a programming language
    • Interactive editing. Interacting with humans
    • Language is irrelevant. Display and edit in any language
  • The browser

Data transfer: binary intermediate AST

Cloud environment: composed of cloud native Dev cloud + architecture engine + language database.

  • Architecture Engine:
    • The operating system is irrelevant. Plan the architecture of files and folders that are independent of the operating system
    • Automation architecture. Automatic semantic planning architecture
    • Class – Graph database presentation (TBC).
  • Programming Language Database (TBD)
  • Cloud native Dev cloud:
    • Dev Online. Cloud native local development environment, that is, development as deployment

    • Prod Readly. Ready to deploy + online

Introduction 1: No code and cloud development

Code complexity, like force, does not disappear, nor does it occur in a vacuum; it always changes from one form to another.

In 2019, I wrote a long article on how to design Programming without Code because of the popularity of mid-stage + the rise of Serverless. In the last two years, more and more so-called “low code platforms” have emerged, but they are not the type I had in mind, lacking key DSL abstractions. However, the design of a code free system architecture is not very relevant to the topic of this article. However, they set off a wave of developers’ love of cloud development.

  • Cloud design. For example, BeeArt moves a lot of design work to the cloud to complete.
  • Cloud IDE. Such as VSCode Remote and Eclipse Theia, which can deploy new applications to Remote development.
  • Cloud development environment. Nocalhost, for example, can solve cloud development problems for existing systems.

However, here is a negative example: “cloud hosting”, this remote development model is not cloud development. Cloud migration of a series of efficiency tools from native IDES to cloud ides, cloud on native design software, etc., which will allow us to do the whole development in the cloud (browser side) in the future. For us developers, what matters is not the result, but how to implement such an architecture.

Introduction 2: Code architecture and operating systems

Hierarchical architecture is building folders.

Now, let’s talk about something even more interesting, the way code exists. In today’s software development, we design layered architectures with directories as boundaries and files as carriers of code. One of the main reasons for this is that we rely on the operating system’s storage: the file system. For file systems, the abstract logical concepts of files and tree directories are acceptable to humans.

Layered architecture vs modules vs packages

If you’re familiar with the Java language, you’ll notice that operating systems limit the software architecture of the Java language. One of the main features is the package, the directory is the package, com.phodal. Water corresponds to com/phodal/water. Today we are used to using directories as a mechanism for package management. For example, the neat architecture that has become popular in recent years is the circular (or onion) architecture, which is limited by the directory, making the final implementation of the system less intuitive.

(PS: Due to space constraints, I won’t show you more details here.)

If you are not familiar with clean architecture or layered architecture, you can refer to my previous articles and open source projects:

  • Clean Front-end Architecture: github.com/phodal/clea…
  • Layer Architecture:github.com/phodal/laye…

File vs class

On Unix/Linux systems, everything is a file.

A good rule of thumb for programming languages is that a class is in a single file. Although this is not true for languages other than Java, the implementation is readable and maintainable. At the same time, we are used to building similar behaviors into the same class, the blood-rich model. So in my open source ebook, The System Refactoring and Migration Guide, I’ve been trying to convince people to do just that.

In the past, we have been limited to rules of documentation. So, let’s consider the question, what if we could get away from the operating system? If we don’t use files to constrain classes, we don’t need this kind of rule.

Introduction 3: Intermediate representation of code

In order to implement the concepts above, we need to redesign the entire system, one key part of which is the intermediate representation of the code. If you’re familiar with the principles of compilation, it’s only in programming language form when it’s given to a human.

The development state

It doesn’t matter if it’s In Chinese, English, or oracle as long as we see easy-to-read code. Anyway, the code we’re writing is for ourselves, and the characters are eventually converted into special formats through layers.

In ides and editors, in order to achieve support for programming languages such as highlighting, intellisence, etc., it is necessary to re-implement AST class parsing for languages. Such as LSP in VSCode, PSI in Intellij IDEA, Textmate/VSCode Textmate highlighting syntax and so on.

So, let’s consider that if we store in class AST, then we don’t need to implement this kind of parsing. They simply need to be presented in a programming language that humans are familiar with.

Compiled/run state

During the six months I spent designing Datum (originally Charj, renamed Datum) with my colleagues, I analyzed the intermediate representations of some of the major languages on the market, such as Bytecode for Python and JVM, MIR for Rust, and so on. They’re kind of intermediate representations, but they’re not that different, and the principles are similar. They are all transformed from the AST of the previous step and redesigned in a way that is close to the bottom.

At this point, a more interesting question arises: if a part of the code hasn’t changed, do I need to recompile? Since a particular part can’t be helped, it won’t change.

At the same time, a more interesting thing is that all changes only need to make a patch, and the app will run again.

New code architecture: water

Programming languages are code written for people to read and machine code written for machines to run.

Based on the above ideas, we can design a new generation of code architecture: water. Because the shape of the code doesn’t matter anymore, it doesn’t matter what language you write it in, just a unified back end (compiler back end). Let’s start with the architecture definition for the first version:

Water coding architecture, that is, the output of human programming is no longer the type of string of programming language, but only the presentation of this architecture mode on UI during programming, and its final storage form is a series of intermediate forms of codes, such as AST, HIR, MIR, etc. Instead of being tied to the operating system and file system, code can be controlled by smaller architectural factors, namely functions. Code no longer needs to be stored as files, and the architecture of the system is no longer a hierarchical structure divided by directories.

From the above definition, it has the following characteristics:

Language is irrelevant. Because the storage takes the form of an AST DSL, developers can develop in any language. For example, if A is developed in Java language and B is developed in JavaScript language, they can be converted into A unified AST when stored. The programmer B uses A rendering language in his editor, so the code B sees written by A can be JavaScript instead of Java. In the case of Java programmer A, who uses Java, the rendering language of choice is Java, so even if A C programmer chooses Golang, he will see A Java rendering. (PS: Of course, this is just theory, and there are still a lot of questions to be answered.)

Fine-grained rights management. Under the current code management mechanism, the management of developer rights is based on the unit of code base. But under the new architectural pattern, it can be controlled down to the function level. Mostly because the code has become data, developers only write models and behaviors. It can also achieve finer granularity, combined with the idea of Typeflow proposed by our company, you can control the granularity of functions for permission management. That is, a particular developer only has permission to modify a particular set of business code.

Automation architecture. Classical architectural mechanisms rely on the definition of folders, in which code for similar behavior and functionality is unified in the form of folders (also known as packages). When we eliminate the concept of files and folders from the architecture, each model and its behavior exists in a new form. If you think about how we represent data and its relationships in a graph database, it can automatically optimize the architecture, such as automatic package definitions based on clustering algorithms.

Everything is data. On Unix/Linux systems, everything is a file. When it comes to the cloud, it’s all data. Although, in a sense, files are also a kind of data. The code itself is data.

Some other interesting things:

Language database. When we convert programming languages into data, we face two challenges: what form does the data take and how do we store it? Even under the scenario of 5G, we also need to consider the problem of 4G network transmission, so the transmission medium may be some kind of binary package, such as the dex package in Android, with other rich information.

Edit mode takes precedence. Beyond the constraints of the language, there is another factor to consider in this architectural pattern: rendering and interaction on the editor /IDE. We need fast editing and feedback, so editing takes precedence.

There are other interesting things that we can imagine.

Beyond the Serverless

As a result, we can associate another interesting concept is Serverless.

Serverless architecture refers to a large number of applications that rely on third-party services (also called back-end as a service, or “BaaS”) or custom code (function-as-a-service, or “FaaS”) running in a temporary container. Functions are the smallest unit of abstract language runtimes in a Serverless architecture. In this architecture, we don’t care how much CPU or RAM or any other resource it takes to run a function, but rather how much time it takes to run a function, and we only pay for that time. Serverless Architecture Application Development Guide

The deployment mode of our whole new architecture is very similar to Serverless, and our system is still a complete whole, which can not rely on a large number of online services. Since compilation and development are integrated, we complete deployment at the same time we complete development. In some ways, the Water architecture is faster than Serverless and is more suitable for complex architecture applications.

PS: I will introduce this pattern in a new article.

Model-behavior-separation

Behavior is built around the model, and the code is no longer constrained by the model under the whole mechanism. Then, we always have to solve the database problem.

Today, our programming language is constrained by models, and models are influenced by databases. In this case, if we don’t rely on the model, then we need to devise a new mechanism to get us out of the database — and presumably invent a new database.

Looking back, the database itself is also a kind of data. It would then be interesting to look for a better data model from existing papers.

other

I am still studying and thinking about this new architecture model, and there are still great uncertainties in the future. But it’s really fun.

Also welcome to GitHub for discussion and research: github.com/phodal/wate…

Related articles:

  • Cloud R&D: R&D is Code: github.com/phodal/clou…
  • “Code: from research and development of low code, cloud to cloud” : www.phodal.com/blog/codify…
  • Programming without Code: www.phodal.com/blog/low-co…
  • The Charj – code of the language: www.phodal.com/blog/charj-…