> Moreover, once it [Threads] was developed, the infrastructure teams were given only two day’s notice to prepare for its production launch. Most large organizations would take longer than two days just to draft a project plan involving dozens of interdependent teams, let alone execute it. At Meta, however, we quickly established war rooms across distributed sites, bringing together both infrastructure and product teams to address issues in real-time. Despite the tight timeline, the app’s launch was highly successful, reaching 100 million users within just five days, making it the fastest-growing app in history.
Kind of more impressive to have kept this ability to ship fast than anything else. A lot of work is needed to not let the bureaucracy increase and stop the lawyers or other functions from creating approval gates everywhere. Or at least to be able to have war rooms that can get it done so quickly.
The secret of keeping deployment velocity high is having nothing important to lose. Facebook can drop or repeat a few posts and pics here and there and nobody will bother really. Anyway, who are they gonna call?
The moment you start dealing with transacted data that requires immediate consistency and guaranteed persistence then yeah, managers get nervous, lawyers get involved and there's no way for guerilla ops to hold up.
Nobody's paying for Meta's stuff except advertisers and something tells me that _their_ part of the pipeline is handled with _much_ greater care than the "consumer"-facing apps.
Threads has 320 million monthly active users and 100 million daily.
It’s certainly not a flop. It’s almost as big as X globally.
There’s a tremendous locality bias around Meta’s products. Social media use is very localized by geography and age. So if all your friends stopped using Facebook ten years ago, you assume that’s probably true everywhere, while in fact they’ve added several billion users on FB since then.
It seems like a flop not in user count but in that I'm not seeing it originate any valuable content.
I've seen several interesting posts from Bluesky referenced elsewhere already, but literally nothing from Threads despite it having had more users for longer.
I think their point is that you see Twitter and Bluesky links shared to sites like hackernews or Reddit (ie the internet aggregators), but not anything from threads
Elon Musk's trans daughter Vivian has been very active on Threads. Maybe she's on BlueSky now too, but I think she started on Threads.
(She's 20 years old, her father is the richest man in the world, and he regularly does interviews saying his child is literally dead to him. Enough drama for an Orson Welles script and a second-tier social media service.)
I frankly don't believe those numbers at all. They've massively pumped it through integration with Instagram, and have integrated dark patterns around profile creation and follows based entirely on Instagram social networks. Yes, a lot of people do actively use Threads, but it's absolutely not a first class social network [yet].
Form and no function is the essence of social media apps though, so they nailed that. 300 million monthly and 100m daily users is also pretty decent, so I don't think "nobody" cares. For comparison, that's approximately the same number of MAU as twitter.
In fairness, that successful is probably less to do with the merits of the product and more to do with the timely exodus of users from Twitter and aggressive marketing from Meta’s other properties. I’m not saying Threads is bad—I haven’t used it—only that the success or failure of social media offerings is rarely attributable to the platform itself as opposed to the network of users.
Zero of whom pay for the product. They are the product.
What this leaves out is that the real customers, advertisers, already have a fully-functional hyperscale system creeping on users and pushing ads in their face. It's akin to crowing about how fast you can put up new billboard ads when the billboards are already built--someone else has already put in the power/network lines, poles, screens, routing, etc, and you just hooked up your existing ad feeds to it. Facebook doesn't give two shits if the billboards were in the middle of nowhere, as long as they can charge their advertisers some $$, they'd deliver seizure-inducing hallucinogenic drugs right into users' veins.
If you can put up new billboard ads ten times as fast as your competitors, that's still a competitive advantage. Whether you like Meta or not, two days to provide sufficient infra for a new service with 100m users is a technical achievement that not many organizations on the planet could match.
incumbent networks don't really lose. they saw potential blood in the water at the time with the rumblings of a mass exodus and made an excellent attempt at capitalizing though.
threads as a product was DOA when that didn't work. you need a network of interesting important people for it to be useful. when the migration didn't happen, you ended up with a bunch of instagram meme influencers reposting their content across two apps instead
I think their strategy combined with an open offer exclusivity bonus could have given them the stickiness. up front 5k, 10k, 15k, etc to a twitter user that matches their follower of at least 25k, 50k, 75k, etc count on threads and agrees to exclusively post there for a year. people weren't getting paid on twitter so this would have been alluring
Threads feels more like a new feature in the facebook meta-app (heh) than a new app in its own right, especially with how it has to be bound to an instagram profile.
"our new feature reached 100m users in 5 days" sounds a little less impressive (especially given meta has multiple billions of users to start with).
Precisely, for all the talk of efficiency last few years, how do we even begin to measure the total waste of effort and energy of so many smart people that this was? All that effort and stress for effectively nothing, or perhaps even net negative effect.
> Precisely, for all the talk of efficiency last few years, how do we even begin to measure the total waste of effort and energy of so many smart people that this was? All that effort and stress for effectively nothing, or perhaps even net negative effect.
This comment is perplexing.
Reportedly Threads has 130 million monthly active users (and growing) vs. 550 million accessing X (and shrinking). This alone already places Threads as one of the top social media services in the world. It's projected revenue for 2026 is over $10B.
Well the thing is, I never argued on the basis of revenue in the first place, and that second argument I wrote in response to someone else, not this person. But even if we did take revenue as sole metric of success (remember kids, revenue!=profits!), well, if the Meta CFO states that they dont see Threads as a driver of their revenue for the whole of 2025, who am I to dispute that really? But maybe you folks have some insider insights ;)
You wrote a wall of text saying and refuting absolutely nothing.
The only remotely tangible argument you made was the blurb on “Specifically, as it pertains to monetization, we don’t expect Threads to be a meaningful driver of 2025 revenue at this time,”. Considering Meta reports revenue around $40B and Threads, considering the EU snag, was basically launched last year, this is far from being the failure you are trying to spin it.
But if you had any point worth making, and you didn't had an axe to grind, you wouldn't grasped at straws such as the "As for the wider positive effects to the society at large, feel free to point out any."
Sure mate, I am definitely the one with an axe to grind here ;) Now prove me stupid and show me and the HN readership how is a product which even it's own management sees as unprofitable for the next three-four quarters, producing any benefits at least to it's shareholders, not even talking about the society. Maybe lean into the argument with a pinch of how the latest Meta "free speech" policies augment it's positive contributions too ;)
Again, the argument for overall net value is not based on a single KPI, be it DAU, MAU or unproven/unsourced "utter global dominance". By focusing on 'dominance', instead of value, you're missing the whole point. You know who also had total communication service dominance (we call that "monopoly" actually) for a good chunk of the 20th century in the US? A little company called AT&T. Look them up and why they were split up. The difference being, at least AT&T was highly profitable at the time. But even if we took your argument as valid, haven't all the MBAs been telling us that unprofitable 'cost centers' should not exist, and should be extinquished, surely you must have heard that argument at some point? So why then are we allowing VC capital to provide a lifeline to some of those utterly unprofitable "businesses" indefinitely? The quotation marks around "businesses" because that's another axiom from business schools - you have a business once you start turning profit. Isn't the point for market to promote the winners and defer the losers? I am afraid the way AirBnB, Uber & co. operate looks a lot like a capitalist version of communist central planning commitees, when we still had independent and free press it used to be called "crony capitalism".
DAU is a very one-dimensional and misleading KPI, one that gave us ever-growing, VC-backed unprofitable companies. The Threads is definitely net-negative in the sense of wider impact and not just single-dimensional KPIs. Take as example Uber and AirBnB. Both have had millions of active users and one could argue their service is even useful to a lot of people. It's still quite easy to argue that they are are net-negative, because even by the narrow metric of revenue and profitability, they are not even returning value to their shareholders - they turn a profitable quarter like what, every few years or so? The effects on society such as traffic congestion, real-estate market etc. are also quite visible in many major cities in the western hemisphere these days (and beyond even...). Now specifically for the Threads: as it stands right now, they are not even driving revenue at Meta, in the corpo-speak of their own CFO: “Specifically, as it pertains to monetization, we don’t expect Threads to be a meaningful driver of 2025 revenue at this time,” said CFO Susan Li on the earnings call. “We’ve been just pleased with the growth trajectory and again, are really focused on introducing features that the community finds valuable.” (https://www.cnbc.com/2024/10/31/metas-threads-app-now-has-27...). As for the wider positive effects to the society at large, feel free to point out any. As far as I recall, they launched with the promise of sort of "sanitised" contents being promoted, kind of Pinterest 2.0 if you will, beyond that I am not aware of anything relevant coming out of that platform honestly.
Ach. You seem full of (unrelated, unwarranted) ragefulness, and you stand ready to handwave away both scientific studies and actual user experience. Without, I see, providing any grounded counterpoints. As a bonus, anyone who disagrees with you is a ‘tool’ of sinister ‘bad actors’
Are you quite sure you’re on the right forum? That’s not generally how we do things here.
Not to burst your weird leftist-rightist libertarian bubble, but vaccine passports are actually a fairly normal thing in most of the world mate. You know, it's just about documenting what vaccinations you were receiving throughout your lifetime. Go outside and touch the grass, you sound very angry.
No mate, read your comments again- the only one projecting anything here is you. For my part, I was quite sure how to label you and it's based on the first two rows of what you wrote, which is a mix of leftist, right-wing and libertarian views. Now as others have also noticed, you seem to have some deep-seated psychological issues, most notable in your rumbling and unconnected thoughts. I mean this without malice: seek help.
I don't care about you - I don't even know you. But in the interest of the whatever community you are a part of, I implore you to look for professional help. And put your smartphone away - it's obviously not good for you.
... speaking of vaccines "not preventing transmission"...for the love of god, please learn how the vaccines work. Did they not teach you in the primary school that it's about priming body for the immune response by introducing a very small concentration of the virus into the body. No vaccine can prevents a transmission, it's purpose is to enable your body to respond and reduce the severity of the infection symptoms. Have the education standards really deteriorated so much, or is this just the effect of overexposure to the social media, I wonder?
Not being able to get your kid to sleep because the upstairs flat is now a weekend party place for tourists is… a non-problem?
‘Wonderful’ convenience for tourists, sure. Writing off reasonable objections as ‘ideological zealotry’ from ‘bad political actors’ is a very weird take.
Case in point, I recently backed out of buying a house when I found out the house next door was an Airbnb, and the neighbours were not digging the thoughtless visitors. Is that an ideological preference or a pro-sleep one?
Please read more carefully and don't shove your own words down other people's throats, it's rude and un-called for. What zealotry? The only politically-sided arguments in this thread are coming from you. By the way, if you took a few more seconds to read a bit more carefully, you would have noticed I never said they did not provide any value - only that the net negative seems to be outweighing the benefits to the society as a whole, and this is based on hard data, not ideology, mate. This is to make comparison to Threads, which is, unlike these two, not even providing a real-life service to justify it's existence. Plus, please learn: revenue != profits. I won't go down the rabbit hole of discussing the business models of Uber and Airbnb, you obviously haven't understood who carries the actual operating costs of those "businesses" and maybe in your world these are great services, as they come with an implicit ego-boost for all the Karens of this world?
If so many people are so smart and still working on these products at Meta then either these product matter as per smart people or they are just ordinary people working for their paychecks.
Threads is odd. The thing about Twitter was its very high media profile; you'd see tweets quoted in all sorts of places like newspapers and TV. The sharp heel turn is in the process of killing that, especially by closing off the site to not logged in users, but you can still see people quoting/screenshotting tweets on other sites. Including bluesky.
I don't think I've ever seen a viral Threads post that escaped the network?
Meanwhile Facebook screenshots still circulate as memes (derisive, but hey, the motto of the modern era is that the world hating you is better than them not noticing you). And Instagram is fairly entrenched as a marketing channel.
Threads has a nice selection of extremely smart, interesting people, trapped inside a short text limit and an algorithmically curated culture of adversarial outrage porn and drama.
BlueSky feels a lot more adult.
Threads also has a huge oversupply of fragile mediocrities who love the drama and/or haven't worked out it's a terrible medium for self-promotion, and has generated some of the most jaw-droppingly stupid takes I've ever seen online.
(And I used to be on AOL...)
A good few people left Threads when the censorship started to become too obvious, and a lot more will leave when the ads become too heavy-handed.
The most perplexing move for me was the decision to not automatically give Instagram users a threads account that they were logged into.
I'm sure Meta has some internal reputation metric for Insta accounts, and it would have been trivial to give all accounts above a threshold immediate access to threads.
But nope, you have to create a threads account to view threads.
It this what happens with very engineering led businesses perhaps? In other places, there would have been a lot more focus on the the final product before shipping but nowhere near the speed of delivery Meta achieved which is impressive.
I'd hate to have this kind of pressure every day, but a week long sprint like this does sound like it was kind of fun. These things can _suck_ when they're out of nowhere and there's a lot of pressure, but I've also had product hack-weeks that were just an absolute blast.
That said, I'm a young single guy. If I had a family and no lead time, this doesn't sound great.
I guess its only real USP is "automatically import your Instagram friends", except that doesn't really work properly because only a fraction of people on Instagram seem to be interested in Threads.
Its fediverse integration stuff isn't panning out because nobody in the fediverse is stupid enough to let them federate. The single thread per message instead of hashtags thing doesn't seem to add a lot either.
The "product" in a social network is the community and the culture. Despite being terminally online I can't tell you what those look like for Threads. I can for FB/Insta/Twitter/Bsky/LinkedIn/Mastodon.
I think most of those are Instagram shoving it in your face. Yeah I'm a "Threads user", but only because of the inline feed in Instagram. I'm annoyed when there is a notification blip but it turns out to be Threads spam.
Threads' launch was intentionally rushed in order to capitalize on the user discontent at the time. Without a large alternative, enthusiasm to migrate to another social network would have waned. Note that Bluesky was still invite-only when Threads launched.
How many users on X (Twitter) or Bluesky are bots? It's reasonable to assume the percentages are the same, given the lack of public information for the major text-based broadcast social networks. X is estimated to have 250M daily active users. Mark Zuckerberg recently stated that Threads' DAUs were 100M. Threads achieving a bit under half the size in such a short time is impressive, especially since Threads still lacks many features that X has had for years.
> It's reasonable to assume the percentages are the same
It's reasonable to assume they're worse. Bluesky doesn't have Facebook's network or surveillance apparatus. Neither does Twitter, except it's a higher-value target than Bluesky and Threads combined.
Daily Active Users (DAUs) and Monthly Active User counts represent the amount of users that perform activity on the application daily and monthly, respectively.
I’d far rather work in an environment that required and enabled me to get shit done than the opposite. My last W-2 job required a quarter-long planning period and then I had to go from team to team pleading my case and begging them not to stone wall me. Each time requiring a specially tailored slide deck, going behind the scenes to ask what questions _that one guy_ is going to try and derail us with so I could have an answer ready, and often follow up with a second or third meeting to address concerns of the DBA team on what effect using a WHERE clause will have.
I yesterday sat in a meeting where someone showed for an hour all kinds of organizational charts with acronyms and vague terms, and who could do what and who is the boss of who. The various organizations were there to help employees dealing with European privacy regulations while also publishing data open access when possible. Basically: Which parts of the org can help you to deal with the contradicting government regulations.
I honestly find high pressure work more relaxing than these kinds of meetings.
You get those meetings at Meta as well ... because there are re-orgs multiple times a year and so the org leadership is sent to sell their teams on how this is now the perfect org structure. Until the next one, of course.
There's a famous poster from the Facebook days - don't mistake motion for progress. I always thought they should make one for re-orgs.
You get addicted to the adrenaline. If you’re with really good people that kind of pressure is actually fun. Sort of like how a championship sports game is fun.
I think the joy of working at this scale makes up for that for a lot of people. There are plenty of low-stress, low-impact companies that someone that's bigtech-approved can go to
Its surely not for everyone but I think this is the wrong take. They did not detail how the rest of the days look but one can imagine they are not launching a Threads every week/month/year. This must have been an exciting project for folks.
Having been in both setups, the frustration and demotivation I got when working in the bureaucracy was way worse than doing the rare weekend work and on-call rotations at a place where things moved fast. Many people get pumped up by doing stuff.
While I was at FB (it wasn't Meta then), I saw what a superpower the infrastructure is there. Product engineers build things of a scale in days. While I was there, I got to be tech lead for several different teams (2x distributed dbs, 1x Dev Efficiency, 1x Ads), some of which are called out by name here.
Shout out to the HBase and ZippyDB teams! This is the first public acknowledgment that ZippyDB was converged upon.
It's also super cool to see the Developer Efficiency pushes called out. 10,000 Services pushed daily, or every commit is so impressive.
I find it interesting how they describe the PHP web front end as a "serverless" or "function as a service" architecture. I guess it's a matter of perspective. It's a service that has a monolithic codebase with lots of endpoints deployed to it. I guess from the perspective of the maintainer of one of those endpoints it's "serverless" but that abstraction (like all abstractions) has leaks: the teams responsible for the top endpoints and those working on shared libraries can't treat the infra as a given, but rather need to be acutely aware of its limitations and performance characteristics.
AWS Lambda: CGI But It's Trendy. Recently we've seen the rise in popularity of AWS Lambda, a “functions as a service” provider. From my perspective this is literally a reinvention of CGI, except a) much more complicated for essentially the same functionality, b) with vendor lock-in, c) with a much more complex and bespoke deployment process which requires the use of special tools.
People who don't think this is true probably never used PHP or CGI ... You don't manage the server; you just upload your code!
Architecturally, it is the same. From the user experience, it is the same.
PHP and CGI also scale infinitely, because they are stateless. Meta scales PHP to a user base of half the world's population. They could also scale CGI/FastCGI.
I think the full description of “stateless, serverless functions” adds a bit of clarity. My read of this is that whatever code is running doesn’t maintain state between requests, and doesn’t touch the underlying operating system. Which seems pretty standard for highly managed environments anyways. It’s been years since I’ve written backend API code that touched the underlying system, or left objects in heap between requests.
The flexibility of knowing that any machine can instantly run the code for your API gives a lot of flexibility to rapidly scale up an API.
Nothing is “serverless” to everyone. Especially when you run the data center. But being “serverless” and even sitting above the “language runtime” gives API developers a lot of freedom to focus on business logic.
There’s only one entry point for all requests. It’s php (well, Hack) all the way down from there. So all the routing is readable as php code, request handling etc.
As I see it, it's just an abuse of terminology. By referring to Fargate as Serverless which previously referred to FaaS, they (apparently AWS's marketing team) have seemingly removed the distinction between FaaS and PaaS
Amazing. All of this truly wild, impressive technology and some of the absolute best engineers in the world, just to shove more ads in people's eyeballs. Sigh.
The tragedy of our times is that some of our brightest minds are working tirelessly, not to cure diseases or to explore the stars, but to improve the probability that someone will click on an ad by 0.1%.
Take it up with capitalism and with DARPA. If we could earn $700k doing cool science shit in white shirts and black ties for Uncle Sam like our grandfathers maybe we would.
>Amazing. All of this truly wild, impressive technology and some of the absolute best engineers in the world, just to shove more ads in people's eyeballs. Sigh.
If only which that were true, it is more like building our own digital prison.
It's unfortunate there is so much cynicism and negativity in these comments. I understand many people here strongly dislike Meta but the actual article is awe inspiring to me. I didn't know how extensive and complicated the infrastructure underpinning the modern digital world was. Reading this article and seeing the sense of scale is mindblowing to me. It feels almost like a miracle and like magic. A "wonder of the world" in terms of the amount of effort and complexity.
Maybe the company in charge is bad in many ways but all of the things in the article are astounding to me.
I am not an engineer like many of you so maybe the article is old news to you guys but I couldn't help but say "wow".
I feel like if you took some old science fiction writers from back in the day and showed them this article, you would find sheer awe in their faces too.
> I feel like if you took some old science fiction writers from back in the day and showed them this article, you would find sheer awe in their faces too.
I can't tell if this is satire. The science fiction writers from back in the day wrote about humans exploring other planets, building sentient machines, encountering aliens or maybe developing psychic powers. (looking at you two, Philip K. Dick and Theodore Sturgeon)
Building the world's biggest ad-serving infrastructure might impress them. It impresses me! But "sheer awe" seems a few orders of magnitude off the mark.
I'm getting downvoted here but can't for the life of me guess why. If you've actually read Isaac Asimov and his peers, as I have, they wrote about robots with complex inner lives and space ships that traveled to alien civilizations. There is no reason whatsoever to think they would experience any more "sheer awe" at Meta's infrastructure than any other random person from their time period. I honestly think they would be disappointed that it was all done to sell advertisements to people rather than to help humanity explore the universe -- or anything else of actual value.
> In a datacenter environment, we prefer centralized controllers over decentralized ones due to their simplicity and ability to make higher-quality decisions. In many cases, a hybrid approach—a centralized control plane combined with a decentralized data plane—provides the best of both worlds.
This approach appears to be one of the most optimal designs for software networking (service mesh) and for storage (database operations) for organizations with large server counts. I was surprised to see their IP networking to follow the same model, rather than primarily relying on BGP.
It was omitted in this paper, but I would expect for local caching to be used to reduce load on L7 routers and for improved latency for database queries. Clients can invalidate caches and perform another lookup to the service mesh after a reasonable timeout (100-500ms).
Quickly developed serverless functions combined with
continuous deployment, and anyone can make edits in
the entire codebase sounds like a dystopian nightmare.
The amount of logging that is required to debug and
find bugs is extreme.
Using Erlang to write serverless functions seems like
avoiding all the huge benefits BEAM can offer.
>Additionally, product engineers predominantly write code in stateless, serverless functions in PHP, Python, and Erlang for their benefits in simplicity, productivity, and iteration speed.
>To boost developer productivity, Meta has adopted continuous deployment universally and enabled more developers to write serverless functions rather than traditional service code.
Yes it’s dangerous. That’s why the hiring bar is very high, and everyone spends their first 8 weeks in training bootcamp before being let loose in their real project team.
> The amount of logging that is required to debug and find bugs is extreme.
Ten or so years ago I remember going to a larger technical recruiting pitch at Facebook where they discussed their logging complex (I have only a vague recollection of the details). Honestly it was one of the most beefy implementations of system/application logging I remember having seen at the time.
It powers the world’s best experimentation platform too. Which is their business superpower. Competitors can spend a year thinking about what to do. FB has already tried and measured all options and pivots to the optimal.
I have been very disappointed since moving from FB to Google at how much worse the data and measurement platform is here.
> Moreover, for non-AI compute workloads, we offer only a single server type, equipped with one CPU and the same amount of DRAM (previously 64GB, now 256GB).
I'm reading something like this for the first time. Is this common across industry or only typical to Meta? In contrast, we use multiple instances for sub-components of our ML training pipeline.
At least in my experience in a somewhat bespoke industry on the non-cloud side, things have been moving in that direction due to most of the reasons they identify (it also ties nicely into the global "Datacenter as a Computer" concept they hit on).
People in my industry have a tendency to over-optimize (technical folks love nothing more than to tinker), and in doing so create very unique deployments per group that require a significant amount of infra and operational support, and drastically slow down the rate of progress/change (not to mention the cost). When you peel back all the requirements it turns out we really only need three unique deployment options. Makes it all significantly less cumbersome.
One reason for using a single CPU is that the number of cores per CPU has become very high and only using a single CPU avoids a lot of complexity around Non-Uniform Memory Architecture where the OS should allocate data to RAM that is "closest" to the CPU it is being used by for best performance.
Not to my knowledge, no. If it's a virtualized server, maybe this works, but to my knowledge most companies try to maximize their hardware usage through a variety of slices/sizes based on underlying resources. I am not a datacenter expert however.
> the image is not cached at CDN109 when the user requests it, CDN109 forwards the request to a nearby PoP. The PoP then forwards the request to the load balancer in a datacenter region, which retrieves the image from the storage system.
Say I want a 1MB image, wouldn't it be faster to serve me the 1MB image over a slow connection with 100ms latency, than going through multiple hops of increasing latency, with multiple round trips?
Say I request the image directly:
me -- 100ms --> datacenter
datacenter -- 100ms --> me
Say I now go through Meta's system, assuming that goes to the same Datacenter in the end, and there's no FTL tech:
DC-to-DC networking is typically much higher throughput, and often lower latency (due to fewer hops).
Consider also that the CDN can fetch the whole image async after the first request, but the nature of TCP (that you'll be using to fetch objects) is serial, meaning you can only fetch about 1k each round-trip.
Now, in reality the situation is way more complex, because various hacks have been added over the years, for example your browser might grab the size header and just create a bunch of download threads for your image with various offsets, the TCP scaling window might come into play; but largely most people set 1500b for their MTU on consumer hardware, and there's some overhead from TCP/IP and so on.
Serving an image over HTTPS implies initiating the TCP connection (which requires minimum 3 packets) and the TLS connection (which requires many more, lets say 10).
CDN <-> PoP <-> Datacenter communication doesn't require initiating connections. They reuse them because they centralize requests to serve different users (this is even explicitly called out in the article).
Lets say your closest datacenter is 50ms away, and the closest PoP/CDN node is 10ms. Just initiating the image download through HTTPS all of sudden is 500ms vs 100ms.
Sure, PoP/CDN might need to go to the datacenter to fetch the image (and only if the content is not cached there already) but that only happens once before it gets cached, and there's still a lot of ms to use on that to make the tradeoff worth it.
The connection between CDN and PoP and data center is much faster than you think.
You don't need FTL tech. You just need good private fiber. Geographic distance is hardly ever the limiting factor on the public internet; yet it often is for private networks.
This is a reasonable question, but I think you've made some wrong assumptions.
Typically, the CDN nodes will be within your ISPs network, typically on the way to the PoP (or at the PoP). You're unlikely to see a significantly different round trip time if your request is proxied through the CDN vs going to the remote datacenter directly. But, even if your total round trip time is a little longer (10ms in your hypothetical), you get benefits from local TCP and/or TLS termination.
At ~ 100ms latency, your effective bandwidth is usually limited more by round trip than actual connection bandwidth, because of congestion control / slow start. A 1 MB image is going to be somewhere around 700 packets with ~ 1500 MTU. Assuming standard congestion control, the initial congestion window is 10 packets, for each packet acked, the server can send two packets. Assuming you and the server and the path between have infinite bandwidth and a large enough receive window, the client wound receive 10 packets at t=100ms after the initial request, then 20 additional (total 30) at t=200ms, ... 320 additional (630 total) at t=600ms, and the remainder at t=700ms. If your effective bandwidth is less than about 40 Mbps, you're going to hit congestion in the 6th bunch of packets, but any connection speed above that and your 1 MB transfer is going to take 700 ms. If you've got a much smaller MTU, you might need more round trips; and if you've got a larger MTU, you could end up with less round trips, but word on the street is inter-AS jumbo packets is rare, and Path MTU Detection isn't great, so lots of servers force an effective max MTU of 1500 or less, because it's easier to send smaller packets to everyone than to fix path mtu issues.
But if we have your steps of 10 ms to the CDN, 10ms to the PoP, 90 ms to the datacenter, and let's assume the transit time between you and the CDN and you and the PoP is symmetric, we get
Your request starts at t=0, CDN to PoP request starts at t=5 ms, PoP to DC request starts at t=10ms, then the PoP to DC request takes 7 round trips = 630 ms for all data, and you get that 10ms later at 640 ms. Your first byte of response is delayed by 10 ms, but it's worth while because time to last byte decreases by 60 ms. If the transit times between hops are asymmetric, the times that the client gets data don't change, but the math makes my head spin more.
If you change it up, and say you're 50 ms round trip to the CDN, which is 50 ms round trip to the PoP, which is 50 ms to the DC, first byte jumps to t=150 ms, but time to last byte would be 7 * 50 ms + 100 ms = 450 ms.
And, if the PoP -> DC connection happens over a warm socket, with an appropriate congestion window and receive window, the transfer from the DC to the PoP can happen much faster. Certainly, you could potentially have a warm connection to the data center from the client as well, but most services don't want to have millions or billions of client connections to all their datacenters, so running them through PoPs or CDNs can be pretty handy.
There's some handwaving here (I ignored processing time at each hop, but it's usually pretty low), but it's really helpful to process congestion control closer to the user. Any lost packets can be resent sooner for much quicker recovery if managed locally as well. And for things that are cachable, read-through caching with a CDN -> PoP -> Datacenter approach makes a lot of sense to reduce demand on the Datacenter, and benefiting from likely locality of reference --- people in the same area / on the same ISP are likely to fetch images that others in their area have fetched.
I left before they were Meta, and maybe things have changed, but I don't think they have any intention of being a public cloud. Yes, they've got a lot of similar services as a public cloud, but there's a lot of opinionated choices that make sense for them that I think would be hard to convince customers to accept.
Their infrastructure is cloudy, but it's built around mostly a single customer and assumes the infrastructure software people and the application software people communicate deeply and continously. Running on a public cloud isn't that similar, at least as a small customer.
Could they pivot towards being a cloud service? Probably, but they'd need to do a lot of work to make their platform viable and to earn trust of potential customers, and they'd be entering a crowded market; there's already 6 S&P100 companies in Cloud (Amazon, Google, Microsoft, Oracle, IBM, Salesforce), and tons of smaller players.
IMHO, given their revenues and profit margins, there's no reason to do all the work it would take to offer cloud services too. Unless there's some opportunistic large customer deal made. They also might also need to renegotiate their content node agreements if they use them to serve cloud customer traffic, and that's a long process.
The infra is not isolated enough to be anywhere near ready for public offerings.
Sure there is process isolation, and its hard to break out of the jail/container/whatever you want to call the unit of execution. But its just not ready for public consumption.
The methods for monitoring, creating, deploying and scaling the "service(s)" are just too intertwined with internal access. Whilst there is sorta fine grained control on which other services you have access to, its nowhere like AWS et al.
the _other_ thing is that everything needs to be compiled to fit the platform. Its not run in VMs, its bare metal processes, with some rather fancy shims to isolate away libc (don't ask me more, I know that its there and some of the reasons why, but the mechanics are a mystery to me).
That platform that you compile to is a movingish target.
So no, meta isn't going to host thirdparty stuff, mainly because meta doesn't really have enough capacity for what it want to do now, let alone add more consumers.
Offering a Heroku level of deployment abstraction to one organization's own software engineers, while maintaining performance, is an amazing achievement. Developing a cloud product and all the packaged services with account separation, autoscaling, and multiple regions is another massive endeavor.
Think of it as the difference between OCI and AWS.
Meta would be unlikely to launch a public cloud uncompetitive with Amazon's feature set.
GCP got this reputation because it’s a second class citizen within Google. Google’s own internal infra (Borg, Blaze) is top-notch.
If Meta can pull off the public cloud correctly, I’d trust them greatly - they’ve shown significant engineering and product competence till now, even if they could use some more consistent and stable UI.
A lot of Fortune 50 companies compete in some way with Microsoft, Amazon, and Google so are reluctant to use their services. I think Meta has the benefit of less corporate competition
Anyway in that context, you don't need trust when you have a sufficiently large legal department
It would make sense for them to do so if they have variable load but a lot of hardware waiting for action, like Amazon had back when - busy in the evenings, peak load around the holidays, crickets at night. But if they had that, I'm confident they would have been renting out servers a long time ago now. I wonder if they themselves are customers of the cloud providers.
Not a word about Thrift, perhaps it was too low level for an infra overview, but I would have expected it to make some technical impact from a global perspective.
I wouldn't be surprised if they have switched to GRPC for improved performance. They mentioned that RPC libraries are centrally maintained in their monorepo; a migration from Thrift to GRPC might have taken less than 6 months.
gRPC and Thrift are comparable in performance and there is actually an opposite trend of switching from gRPC to Thrift in the few places where the former is still used.
I participated in a painfully slow migration from Thrift to gRPC. I did not record the performance metrics, but it was internally advertised to be significantly more performant. There are still some Thrift services running at the organization, but most were migrated to gRPC and certainly not migrated back.
At least half their gak is due to them NOT moving quickly and NOT wanting to break things.
IIRC, graphql is a means of papering over a bunch of legacy APIs. They removed foreign keys from mysql using it as a column store db, a vestige of the original LAMP stack still on PHP.
I don't think Meta infrastructural choices are applicable to most folk.
What does serverless land your average dev? A high AWS bill.
Elastic managed Kubernetes stack? A higher bill.
Did you know that you can use YAML and provision actual cloud provider resources with boring tech? Welcome to Ansible.
There is no need to recreate Linux network stack when you have the Linux network stack, and it actually works!
Quite a lot of hacky gak is required when you run node.js as a production public facing web service. A statically compiled binary won't invent novel code execution paths 4 days into a memory leaking runtime bender.
Boring tech is boring, I guess, even if it's new and shiny.
Facebook creates tech to mitigate the pathologies their past continuously present.
> Facebook creates tech to mitigate the pathologies their past continuously present.
Remember when they hacked a running Android Dalvik machine because their organizational constraints were such that they could never remove code or delete unused classes?
Facebook seems like a place where they do amazing engineering to temporarily stave off the disastrous consequences of their previous feat of amazing engineering.
> Did you know that you can use YAML and provision actual cloud provider resources with boring tech? Welcome to Ansible
Anyone using Ansible for cloud infrastructure management is not to be taken seriously. It's among the worst tools for the job - not (always) idempotent, no state tracking, slow, very limited in the resources it can manage, very lacking templating, fun stuff like "state: absent", running, and then having to remove the corresponding lines to delete, etc etc. You're literally better of bash scripting the cloud provider's CLI than using Ansible. Terraform/OpenTofu, Pulumi/tfcdk if you hate your future self are just clearly so much better.
I was making a point about provisioning VPSes instead of trying to wrangle postgresql restores inside kubectl or equivalent, of how your cloud provider is already provisioning a single physical server via a hypervisor.
I was making a point that facecrook overengineering is about them being boxed into corners, about how very little of big techs solutions translate to real world usage in the web industry i am very much taken seriously in for over 30 years.
You read 'ansible recommended', which I could also argue with you about, but I shan't.
Consider that your shrill absolute dismissal, "anyone! Not take seriously!"
as if Ansible has been proven wrongful in a court of law, as if your statements are law and binding, get over yourself, for everyones sake.
Down voting me into oblivion over a well deserved opinion is rather aggressive and this makes this community unpleasant.
I am not clueless and don't like beinf treated as if I am.
> I am not clueless and don't like beinf treated as if I am.
You made a clueless comment which I tired to (constructively) dismiss. If you don't want to be treated as clueless, don't recommend the equivalent of using a hammer to peel potatoes.
What is your actual point? You are criticizing Meta for building complex systems by iterating on boring tech like PHP and MySQL, and instead suggesting that they build their systems on top of different boring tech like YAML and Ansible. Why? The fact is that there are no "boring" off-the-shelf solutions for solving the problems that Meta is facing like whole region failures and performant cross-datacenter routing/sharding.
Your comment feels like it's not actually engaging with the contents of this article. It's not that Meta is creating bespoke technologies only out of fear of breaking past code. Their entire methodology of innovation is highly iterative and grounded in feedback through practical demonstration of results. You say that "Facebook creates tech to mitigate the pathologies their past continuously present", as if that is a bad approach, but considering Meta's success, I think it would be wise to seriously reconsider that position.
This isn't specific to Meta/Facebook in any way. A very large percentage of big tech companies use MySQL without foreign key constraints, because they're problematic at scale.
To be clear, foreign key constraints are not used, but it's still very much a relational use-case / workload.
> using it as a column store db
That's nonsensical. Column stores are for analytics, whereas MySQL is used for OLTP. Both InnoDB and MyRocks are row-oriented storage engines.
Maybe you actually meant "key-value store", which is a more common claim, but that's still completely wrong. The query pattern for Facebook's social graph relies heavily on range scans over indexes, which isn't a concept supported by key-value stores. And there are many MySQL workloads at Facebook outside of the social graph database tier, with extremely varied use of MySQL functionality.
As much as I hate clownfare, it would be very interesting if they published something like that because of the sheer number of data centers they operate
They have published a lot of information on their blog[1]. It's piecemeal, and around other articles about optimisations or security fixes or failures/postmortems, but it's there. Stuff like:
Seriously though, this is painful to watch. Reminds me of those articles/videos where people have to have their jobs "invest" or software/hardware partners in that website and all of a sudden this person is a "Person of Interest".