devlog: webrtc screen sharing

posted on 15.12.2024

Welcome back to a new entry in the series of “migrating a group of ungrateful gamers off of discord”. Today we’re tackling a new feature: the one spoiled by the title, screen sharing. I thought a website would be an ideal solution for it, because there are so many web apis nowadays, while on the other hand I had already dealt with windows native screen recording in my life, and fuck that, together with the whole OS. Surprisingly, I didn’t find such any such a service already existing: they all implement useless features like voice and webcam; who needs that, and in a browser? So time to get back into web development, maybe learn something new, and obtain some better facts for your future arguments with webshits.

Here’s the target design: the host opens the web page, starts presenting their screen or a window, send the link to watchers. Anyone with a link can access the stream, the links are ephemeral and random. All translation is done peer to peer of course, why involve a server at all. Sounds simple enough. Let’s go programming!

the first yak shave

My recent adventures with web development left me bitterly disappointed in the longevity of the instruments. If you have dependencies, they will be deprecated and yanked; if you have a build tool, it will be written in python2 and not available anymore. There is hope in more long-living systems like (fucking) C++ and (lovely) Qt, and they would even work in browsers. Except it’s an even worse idea than the one I have.

I’m inspired by the project called PureScript without Node. How about I try something similar and use JavaScript without node?

finding the compiler

Opening the first website looking for “typescript alternative compilers”, I get the following list: tsc swc ezno bun deno. This is how I learned that bun doesn’t actually typecheck files, so it’s out. Tsc is also obviously out because it itself is written in javascript. Swc doesn’t typecheck either, apparently a “build tool” is something that takes garbage in and gives garbage out. Annd we’re left with the last two blazingly-fast solutions, how fitting because I just love rust so much.

Ezno seems like a cool project and a nice idea. It warns me immediately that it “is in active development and does not currently support enough features to check existing projects”. From a quick glance, it doesn’t for example support async-await and unknown type. That’s sad, but I wish them luck.

I used deno once and found it ok, especially compared to using npm. But, today I learned, it doesn’t actually implement typechecking itself! It bundles a tsc compiler and runs it with its own runtime. How lazy of them, why not just take ezno and improve your compilation speed like 100fold? God I hate how slow tsc is.

One final solution: why do I actually need a compiler? Browser can already run javascript, and javascript is a high level language. I need a typechecker, but fun news: you can typecheck typescript without writing typescript: there is a special syntax for supplying types in the comments. This way I can typecheck the code now, and deploy it in the future without any extra steps.

And for typechecking the comments I’ll pick.. deno I guess, beats using tsc with node.

learning tsdoc

Lazily browse this document: https://www.typescriptlang.org/docs/handbook/jsdoc-supported-types.html#type

let foo: Bar = baz();
// Becomes
/** @type {Bar} */ let foo = baz();

A little verbose, but I hope I don’t have to use it often because of inference. Tsc does have local inference, right?

function foo(bar: Baz): Kek {}
//Becomes

/**
 * @param {Baz} bar
 * @retuns {Kek}
 */
function foo(bar) {}

Kind of similar to Haskell, type declaration on a line above the body. Except incredibly verbose.

type Frame = {
    header: Header;
    body: byte[];
};
//Becomes

/**
 * @typedef {Object} Frame
 * @property {Header} header
 * @property {byte[]} body
 */

// Alternative
/** @typedef {{ header: Header, body: byte[] }} Frame

I’ve only ever seen erlang’s type definitions from afar, but this reminds me of them. Or maybe of clojure, another faint acquaintance.

function const<A, B>(a: A, b: B): A { return a; }
//Becomes

/**
 * @template A, B
 * @param {A} a
 * @param {B} b
 * @returns {A}
 */
function const(a, b) { return a; }

From Haskell to PureScript, with explicit foralls. Hey, there are even constraints and predicates! I don’t understand right now how they are supposed to work, but I’ll keep them in mind.

There are ES6 classes; I didn’t known modern javascript already supported those, I thought they were still compiled out. I won’t use them either way I’m pretty sure, all I need are objects and unions, and functions and parametric polymorphism. I would really like newtypes, but alas.

learning to learn

I’ve got to say, I’m so spoiled by Qt. If I wanted to build a screen sharing software with Qt, I would open its examples directory, copy this one and this one and I’m basically done. MDN is pretty good, but it’s really lacking in examples. Maybe that’s why people like chatgpt, because it can generate some code that almost works which you then file down to something you wanted? I sometimes do it as well and it does give me some inspiration when I’m hopelessly lost in a new topic and all the examples are shit.

The first example I find is on https://webrtc.github.io - seems legit. Except they use some adapter library, ah fuck. I hoped webrtc would be standardized by now, I hoped if it’s a browser technology I can just rely on it existing in the future; why do we only carry our old failures with us in the bloated standards, when the new apis get deprecated and dropped left and right? We can just throw away the cruft too, if we already don’t care about opening our sites in the future.

The second example I stumbled onto by accident, and this time it’s a lot better and hosted on MDN - it’s just that it’s really hard to find browsing the site and it’s out-SEOd by garbage, even in kagi. I’ll put a link here for you: https://developer.mozilla.org/en-US/docs/Web/API/Media_Capture_and_Streams_API/Taking_still_photos#the_startup_function. For me it worked after a light kick in the side. Great start!

What surprises me is how high-level javascript is: the atomic objects you use to manipulate videos are streams. You obtain a ‘stream’, you assign a ‘stream’ to a video element and then it just works. No mucking around with event loop, no synchronizing frames, everything is just already done for you. This is so simple even coming from Qt where everything is also done for you, but a lower-level everything. This approach comes with a downside though: having an opaque stream be your only way of manipulating video means that if you want to do something interesting with frames or timings - you just can’t! Either you use the limited hooks the interface exposes to you, or you’re fucked, start from scratch. Which reminds of that cool project, where a guy did a giant animated panneau, and they had a million problems with the inflexible video element and its unexpected behaviours.

In the end, the best example was this one: https://github.com/shanet/WebRTC-Example - I built the client completely based on it. It’s not great code, but it’s simple and small. It just has a small problem with a race, that if the websocket connection is not done, you’re fucked with a very nondescriptive error. This doesn’t really show in this code, but if you try to reorder it a little like I did, you’ll hit it immediately.

explaination

Pre-rant: for some reason all examples treat the parties as equal. But they never are, it’s a wrong abstraction especially for examples, and I can prove it because the authors then give up and say “yeah, there is an initiator and a responder, but it’s mostly the same, and where it’s not the same we can just ignore the differences maybe right?” And it sucks for understanding.

To establish a webrtc connection, you need a pre-established bidirectional connection, where the parties will exchange SDP and ICE messages. And they will send a lot of them, one SDP and about a million ICE messages. Again, all examples leave this channel out of scope, but in reality it’s important to have it, and you won’t get a connection from just sharing one small string of data as I initially thought. (Well you still totally can, but this will be your own protocol and not webrtc).

Another important thing: webrtc can only be established when the page is opened via https. Great fucking convenience of local development guys. Who the fuck creates the environment like this.

WebRTC api is concerned with establishing a P2P connection between browsers. A closely related MediaDevices api is concerned with creating media streams from screen capture or microphone capture or other things, to send over webrtc. I mention the second one because MDN webrtc (so I thought) example is actually a MediaDevices example, but also because if you don’t have a media stream, then the webrtc conenction just won’t get established at all. And without any good error message, simply a fuck you and no connection at all. Convenient, right? Have fun debugging connections failing, you absolute fool, you idiot, you thought the web was getting better; it was just getting worse in a new direction.

How is a connection established? First the two parties agree on a session via an SDP message: one makes an offer and the other accepts. Then they try to find a way to each other via a flurry of ICE messages. In the best case, they find a STUN server in common that helps them traverse the NATs and make a UDP tunnel between them. In the worst case, they agree on a TURN server that relays messages between them. SDP and ICE messages need to be exchange by some server in the middle at first. So you will also need of a way for the parties to agree on a session to send an SDP message to agree on new a session.

WebRTC doesn’t support broadcasts. 10 years after its creation and after it’s been mentioned in the docs as “coming soon”.

“WebRTC doesn’t do QoS”, the docs say. QoS is when you watch a youtube video and suddenly the quality drops because it decided that your connection is too slow. I’m dramatizing with the example, but thank god webrtc doesn’t do qos by itself; I want more control, not less.

pit of failure

I’ve got to say, typescript is still not a good language. It’s a lot better than raw-dogging JS, but there are still giant traps lying everywhere. While some other modern loved languages try for the approach of “pit of success”, typescript is still “I’m going to try my best”.

Here’s one big example: parsing messages. You see, we in cybersecurity know that 99% of failures come from external input, like messages from server. What does typescript offer to parse messages? JSON.parse - which gives back any. Whyy, this is incredibly bad, it encorages you to restore ‘something’ and just chuck it into your code hoping that it works. It never does! Inevitably someone forgets something, or api changes, or there is a malicious attacker, and at best you see the celebrated undefined is not a function, and at medium worstness you get a type confusion attack that drains the bank accounts. This footgun just shouldn’t be here!

Alright, let’s imagine you’re a beginner programmer that knows a thing or two, so you google “typescript safe json parse”. The top result is still https://stackoverflow.com/questions/38688822/how-to-parse-json-string-in-typescript - which you might notice is an unsafe solution. If you google with “safe” in double quotes, you’ll get some better results - except they tell you to use third-party libraries, like zod or typescript-is. This puts you right into another pit of failure with labels “bloat” and “bitrot” and “complicated build” over it, but whatever.

Let’s imagine you’re a early learner programmer that knows a thing or two, so you know how to validate inputs. So you write:

function validateMessage(msg: any): msg is Message {
    return true;
}

You wrote this as a placeholder to later fill with proper logic; except you find out that it actually compiles! Turns out, the body of type-affirming functions is not checked on what the hell it even affirms. Cool feature I’m sure.

We have to imagine ourselves a medium programmer that knows a thing or three, so that we remember to write const msg: unknown = JSON.parse(str);, and then we have to “parse not validate” - good thing the title was so catchy. Parsing sucks, but at least it sucks equally in basically any language. Now the only thing I don’t understand is why does any exist at all? And more importantly, why is unsafety hiding under every corner? And since everything is better in a rule of three, here’s the third any: when I do import * as Ws from "npm:ws" - everything is again imported as any. Thanks for coming to my ted talk.

complaining about CSS

I’ve tried a lot of different GUI libraries and frameworks for different languages (still working on the article about GUI in rust), and let me tell you straight: HTML+CSS is the worst way to create interfaces, narrowly beating ncurses. HTML is.. fine if verbose, but oh my god CSS. The worst part about web layouting: there are no components 1. You sort-of have them in CSS classes, but they still need you to copy the whole tree of children, all those divs inside divs for each use of a component, instead of writing <my-super-element> and have its children appear and be styled. The second worst part is that CSS is completely undiscoverable. You can style anything with anything; except you can’t, certain properties like align-items apply only in certain context. And there are millions of properties, you’ll never guess which one you need to get the look you want. Quick, what’s the difference between justify-content, justify-items and align-items? Time’s up, you’ve just proven that it’s impossible to program.

I’ve been doing web sporadically, but last time I’ve seriously done it was around 2013-2014, with jquery and elbow grease. I’ve been hearing that the situation’s been getting better, and let me tell ya: it was; it just was also getting worse. For example, JS modules are.. good? We had other ways of splitting code into files and of doing incapsulation, but only modules work properly with typescript, so alright. But tell me, why don’t they work when the page is opened over the file:// protocol instead of http://? WHY? What divine purpose is there to just make the developer’s life worse?

And back to CSS, I’ll give it to you, flexbox is a lot better than the mess that was there before. They are almost as good as just having fucking rows and columns. No caveats. You still haven’t fixed the mess that is element sizing though, so that making two unrelated divs have the same width is an impossible task.

Speaking of impossible tasks, do you remember that funny article that centering things is impossible? I have a proof, it is. I spend such a long time trying to get a unicode fullscreen icon to be centered; I tried all the justify- and align- properties there are, I tried flexbox and traditional, I asked artificial intelligences and real intelligences. This problem could be solved incredibly easily in QML: query the heights of everything involved, and shift the y position to match the middle. But this is CSS, and we don’t have declarative layout, we have I-hope-it-all-works-out layout, where you just write some incantations and pray. You know how I solved this? First, I replaced a unicode icon with an svg one; second, I prayed real hard. And finally, the working solution was one of align-content or align-items or vertical-align (again you have 10 seconds to say what each of them does) (time’s up).

And for a minor funny papercut: I’m used to padding property of layouts adding the padding between all child elements, not at the margins; here I automatically use it this way and swear every time the elements are still stuck together. Old dogs and their old habits huh.

And when you’re finally done putting divs in their places, you open the page and instead of nice localized labels you see бНОПНЯ, because we live in 2024 and utf8 charset is still not default.

getting it to work in chrome

Now for the fun part: browser compatability. I thought this was a solved problem, especially for JS apis, but noo, we’re still in 2012, and of course me testing in firefox only, means that it doesn’t work in chrome at all. Which is unfortunate, because of all the people I’m writing this for only I use firefox.

I spent some time trying to solve this, and it seems like a big problem and I’m really not having fun. So the alpha version which I’m releasing right now will only support firefox and safari (what? why is everyone complaining about safari? works perfect for me).

My name is Morj, and see you in part two.


  1. Except for react, which is a whole different thing. Also there are web-components, but why does nobody use them? From my web mates I only hear groaning without actual explaination; but they have been lost to frameworks a long time ago↩︎