One issue with Wasm is you essentially can't target it with a single-pass compiler, unlike just about any real machine. Wasm can only represent reducible control flow, so you have to pass your control-flow graph through some variation of the Relooper[1,2]. I don't know if upstream tcc can do that (there are apparently some forks?..).
> you essentially can't target it with a single-pass compiler,
That might be true if your source language has goto, but for other languages that start with structured control flow, it's possible to just carry the structure through and emit Wasm directly from the AST.
Sure, I was speaking in the context of C specifically. (In non-simplistic compilers, you may not want to preserve the source structure anyway—e.g. in Scheme or Lua with tail calls all over the place.)
I don’t want to become the switch-statement guy, but neither can I resist, apparently. There are no technicalities in what is allowed in a switch statement: the same things are as with bare gotos. That is, a switch statement is a fancy goto, and case labels are just labels that look a bit funny. Except for the case labels being restricted to inside of the switch body, nesting doesn’t really come into it.
So then the question becomes, which things are you allowed to jump over? In C++, I don’t really know, the restrictions seem fairly stringent. In C, you can jump over anything except a declaration using a variably modified type (i.e. a variable-length array, a pointer to one, etc.), but keep in mind that the variables whose declarations you’ve jumped over will be uninitialized even if the declaration does have an initializer.
If all you want to do is compile and run c code in the browser you could run tcc in the blink x86_64 emulator, running in wasm.
It would take ~300Kb, less than the js & css used in the average webpage
The whole LLVM toolchain is a bit big. I think we can reduce much more the size. We actually researched on using tcc but unfortunately tcc doesn’t have a wasm backend (for generating wasm output). It would be awesome if they added it!
Cranelift is a fast, secure, relatively simple and innovative compiler backend. It takes an intermediate representation of a program generated by some frontend and compiles it to executable machine code. Cranelift is meant to be used as a library within an "embedder".
It is in successful use by the Wasmtime WebAssembly virtual machine, for just-in-time (JIT) and ahead-of-time (AOT) compilation, and also as an experimental backend for the Rust compiler.
Cranelift is an optimizing compiler, but it aims to take a fresh look at which optimizations are necessary. We have explicitly avoided features -- such as advanced alias analysis or use of undefined behavior -- that have historically led to subtle miscompilations in other compilers. Cranelift consists of about 200 thousand lines of code; in contrast, e.g. LLVM consists of over 20 million lines of code, a hundred times larger. This difference also allows Cranelift to be relatively approachable to developers, researchers, auditors and others who wish to understand how it works.
I recently wanted to use tcc for a homebaked programming sideproject and was surprised to find it's no longer supported anymore, at least not by Fabrice Bellard. Upstream git still has some light activity but no releases. I wasn't sure how good of an idea it is to rely on it as a code generator.
Very cool! I've been watching the "toolchains in Wasm" landscape for a while, and seeing a Clang/LLVM toolchain running in Wasm is awesome!
YoWASP has also had an LLVM toolchain working in Wasm for a while too[1], although it seems like this version solves the subprocess problem by providing an implementation of `posix_spawn` whereas the YoWASP one uses some patches to avoid subprocesses altogether
My biggest question marks around this version are about runtime/platform support. As I understand it, this toolchain uses WASIX, which (AFAICT) works with Wasmer's own runtime and with a browser shim, but with none of the other runtimes. Are there plans to get WASIX more widely adopted across more runtimes, or to get WASIX caught up to the latest WASI standard (preview2)? Or maybe even better, bring the missing features from WASIX to mainline WASI like `posix_spawn`[2]? I'd love to be able to adopt this toolchain, but it doesn't seem like WASIX support has really caught on across the other runtimes
It is a bit unfair to Wasmer, because it incur in the (presumed) overhead of `wasmer run ...`, but I could not figure out if the actual clang binary is directly available after it is downloaded the first time.
It's pretty misleading not to mention the performance overhead. That's an obvious downside and quite easy to benchmark. Skipping any discussion of performance feels like sweeping it under the marketing rug :/
A few weeks ago, I tried to compile Clang to WebAssembly, but got several different errors, and tried fixing a lot of them, but some of them seemed kind of impossible to fix, so I thought I would try again at a later date. However it seems I will not need to try again. I feel angry that someone made a convenient solution before I did, but also happy, because this probably implies that they made a consistent process to compile Clang for WASM.
It’s sadly a bit more of a proof of concept than a hackable project. The docker build in the readme did work last time I tried, and there is a demo site at https://jprendes.github.io/emception/, but I’ve failed to modify it in the past to do other things
It is possible provided some care. I was looking into this with WAForth which compiles the wasm and loads it via a host function (ie. it is the hosts responsibility to make it available). I wanted to enable dynamic loading of words from disk which requires some book keeping and shuffling a bunch of bytes around during compilation to write out the bits necessary to have the host do that linking. It isn't impossible to do, just tedious and in my case, having to write it in WAT is a pain.
The Clang WASI SDK weights about 100Mb compressed. We optimized things a bit but still have a way to go (we are not yet compressing in the network). I believe we can serve everything in about 30Mb
They insist on it because it is the proper way to measure data rates on serial bit streams where out-of-band encoding doesn't divide up on octet boundaries.
Like most things in software the use cases are the limits of one's imagination. The browser has always been a Turing complete development environment so this is just another demonstration.
Now all this needs is a simple OS running in a browser, that can edit and compile itself, post the resulting binary onto a WebDAV somewhere, and reload itself from there.
Then it becomes a fully self-sustaining OS that can live forever in a browser.
Do you have a proper link to the webtransport-p2p idea? I've done a few searches but I think there's some mix of current implementation and deprecated implementation somehow.
I don't know why it's fallen off, to be honest, or what was raised against it. Highly desireable to a lot of p2p folk, a very promising webrtc datatransport replacement.
All you need is a virtual filesystem of some sort, a way to download, a way to upload, an editor, a compiler, and a VT100 JS library. We already have WASI for the rest.
If the JS is too undesired, then perhaps go the old framebuffer graphics mode (e.g. a region of the WASM memory that is interpreted as an ASCII screen, or maybe even as a full bitmap buffer). Then JavaScript side just needs to forward keyboard/mouse into memory and that screen region out of memory.
WASIX already does all the other stuff you mentioned, including in the browser. The one thing it's missing is GUI, mainly because there's no standard GUI interface in POSIX.
[1] http://troubles.md/why-do-we-need-the-relooper-algorithm-aga...
[2] https://medium.com/leaningtech/solving-the-structured-contro...
That might be true if your source language has goto, but for other languages that start with structured control flow, it's possible to just carry the structure through and emit Wasm directly from the AST.
https://old.reddit.com/r/C_Programming/comments/16kg48y/mind...
https://old.reddit.com/r/programminghorror/comments/ylc7f3/w...
Found a comment from the author of https://github.com/stclib/STC apparently and then came across this example:
https://stackoverflow.com/a/76887723
gcc -E -ISTC/include co.cAfter running it through a preprocessor, it gives me this.
So then the question becomes, which things are you allowed to jump over? In C++, I don’t really know, the restrictions seem fairly stringent. In C, you can jump over anything except a declaration using a variably modified type (i.e. a variable-length array, a pointer to one, etc.), but keep in mind that the variables whose declarations you’ve jumped over will be uninitialized even if the declaration does have an initializer.
https://cranelift.dev/
From the page:
Cranelift is a fast, secure, relatively simple and innovative compiler backend. It takes an intermediate representation of a program generated by some frontend and compiles it to executable machine code. Cranelift is meant to be used as a library within an "embedder".
It is in successful use by the Wasmtime WebAssembly virtual machine, for just-in-time (JIT) and ahead-of-time (AOT) compilation, and also as an experimental backend for the Rust compiler.
Cranelift is an optimizing compiler, but it aims to take a fresh look at which optimizations are necessary. We have explicitly avoided features -- such as advanced alias analysis or use of undefined behavior -- that have historically led to subtle miscompilations in other compilers. Cranelift consists of about 200 thousand lines of code; in contrast, e.g. LLVM consists of over 20 million lines of code, a hundred times larger. This difference also allows Cranelift to be relatively approachable to developers, researchers, auditors and others who wish to understand how it works.
We wait for grischka to decide when to announce a new release https://lists.nongnu.org/archive/html/tinycc-devel/2024-10/m...
YoWASP has also had an LLVM toolchain working in Wasm for a while too[1], although it seems like this version solves the subprocess problem by providing an implementation of `posix_spawn` whereas the YoWASP one uses some patches to avoid subprocesses altogether
My biggest question marks around this version are about runtime/platform support. As I understand it, this toolchain uses WASIX, which (AFAICT) works with Wasmer's own runtime and with a browser shim, but with none of the other runtimes. Are there plans to get WASIX more widely adopted across more runtimes, or to get WASIX caught up to the latest WASI standard (preview2)? Or maybe even better, bring the missing features from WASIX to mainline WASI like `posix_spawn`[2]? I'd love to be able to adopt this toolchain, but it doesn't seem like WASIX support has really caught on across the other runtimes
[1]: https://discourse.llvm.org/t/rfc-building-llvm-for-webassemb... [2]: https://github.com/WebAssembly/WASI/issues/414
Shameless plug: we are hosting a WebVM Hackathon next week (11-14 October) over Discord. For more information: https://cheerpx.io/hackathon
jslinux: 4.7s
wasmer: 1.3s
webvm: 1.2s
There's a xeus-cling Jupyter kernel, which supports interactive C++ in notebooks: https://github.com/jupyter-xeus/xeus-cling
There's not yet a JupyterLite (WASM) kernel for C or C++.
Expecting performance while compiling C in the browser feels redundant right now though.
I'm working on something similar, where students can compile intel assembly and run it client-side: https://github.com/robalb/x86-64-playground
https://github.com/jprendes/emception
What syntax can be used to run emception? Thank you.
There is a fork at https://github.com/emception/emception that is trying to make it more production ready, but it looks like that may have stalled
I currently have a use case that uses a server running an emscripten build (using SMODULARIZE and some exports, I suppose it’s not a true dylib)
Is this how big a clang toolchain usually is?
I only have to bring this up because network providers still insist on measuring bits
May as well count the 8/10 encoding in a CD-ROM as extra bits. Oh excuse me, extra megabytes
https://www.destroyallsoftware.com/talks/the-birth-and-death...
Then it becomes a fully self-sustaining OS that can live forever in a browser.
Ideally http3 over webtransport-p2p!
Then add some network discovery so we can advertise & find what's available on our networks!
What is it that needs reviving?
I don't know why it's fallen off, to be honest, or what was raised against it. Highly desireable to a lot of p2p folk, a very promising webrtc datatransport replacement.
All you need is a virtual filesystem of some sort, a way to download, a way to upload, an editor, a compiler, and a VT100 JS library. We already have WASI for the rest.
If the JS is too undesired, then perhaps go the old framebuffer graphics mode (e.g. a region of the WASM memory that is interpreted as an ASCII screen, or maybe even as a full bitmap buffer). Then JavaScript side just needs to forward keyboard/mouse into memory and that screen region out of memory.
WASIX already does all the other stuff you mentioned, including in the browser. The one thing it's missing is GUI, mainly because there's no standard GUI interface in POSIX.
Because we can"