Because boy, there is a lot to read.
- XLDB-2015 Conference Program
XLDB events are uniquely targeted gatherings that discuss the real-world challenges, practical considerations, and nuts-and-bolts solutions in the realm of managing and analyzing extreme scale data sets. Attendees include Big Data users from industry and science, developers, researchers, and providers.
- How Not To Use a Cluster
Git assumes it is on a FS that supports atomic operations. Atomic rename. Gluster tries hard but Git can produce a race condition that causes split brain. Even with 3 replicas this fails.
- Mesos, Omega, Borg: A Survey
Cluster schedulers have existed long before big data. There's a rich literature on scheduling on 1000s of cores in the HPC world, but their problem domain is simpler than what is addressed by datacenter schedulers, meaning Mesos/Borg and their ilk. Let's compare and contrast on a few dimensions.
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Mesos decides how many resources to offer each framework, based on an organizational policy such as fair sharing, while frameworks decide which resources to accept and which tasks to run on them. While this decentralized scheduling model may not always lead to globally optimal scheduling, we have found that it performs surprisingly well in practice, allowing frameworks to meet goals such as data locality nearly perfectly. In addition, resource offers are simple and efficient to implement, allowing Mesos to be highly scalable and robust to failures.
- Dude, where’s my metadata?
The replication protocol in ZooKeeper assumes that servers can crash and recover, and the ensemble can make progress as long as f + 1 members are up and running, where f is a bound on the number of faulty servers determined by the size of the ensemble. If the ensemble has 5 servers, then the ensemble can make progress as long as 3 are up (f = 2). It additionally assumes that the disk state before a server crash is there during recovery. This post is pointing out that this isn’t always the case, and it is an issue to be aware of. Note that this isn’t unique to ZooKeeper, but I’ll focus on ZooKeeper because I know it well.
- Braess-like Paradoxes in Distributed Computer Systems
We can think that the total processing capacity of a system will increase when the capacity of a part of the system increases, and so we expect improvements in performance objectives accordingly in that case. The famous Braess paradox tells us that this is not always the case; i.e., increased capacity of a part of the system may sometimes lead to the degradation in the benefits of all users in a Wardrop equilibrium. We can expect that, in the Nash equilibrium, a similar type of paradox occurs (with large N), i.e., increased capacity of a part of the system may lead to the degradation in the benefits of all classes in a Nash equilibrium, whenever it occurs for the Wardrop equilibrium. We call it the Braess-like paradox.
- What High-Bandwidth Memory Is and Why You Should Care
The processors of the future can be mindblowingly, blazingly quick and it won't make the slightest bit of difference if memory can't keep up. A CPU/GPU needs quick access to its addressable memory banks—RAM, generally—because, otherwise, speed is just a number.
- Choose Boring Technology
Adding technology to your company comes with a cost. As an abstract statement this is obvious: if we're already using Ruby, adding Python to the mix doesn't feel sensible because the resulting complexity would outweigh Python's marginal utility. But somehow when we're talking about Python and Scala or MySQL and Redis people lose their minds, discard all constraints, and start raving about using the best tool for the job.
Microservices are a useful architecture, but even their advocates say that using them incurs a significant MicroservicePremium, which means they are only useful with more complex systems. This premium, essentially the cost of managing a suite of services, will slow down a team, favoring a monolith for simpler applications. This leads to a powerful argument for a monolith-first strategy, where you should build a new application as a monolith initially, even if you think it's likely that it will benefit from a microservices architecture later on.
- 2014 ACM Turing Award
Rather than wasting time ranting and railing at the Elephants (although he did some of that as well), he just built successful companies that showed the ideas worked well enough that they actually could sell successfully against the elephants. Not only did he build companies but he also helped break the lock of the big three on database innovation and many database startups have subsequently flourished. We’re again going through a golden age of innovation in the database world. And, to a large part, this new period of innovation has been made possible by work Stonebraker did.
- Distributed Systems Are a UX Problem
Distributed systems affect the user. We need to shift the focus from system properties and guarantees to business rules and application behavior. We need to understand the limitations and trade-offs at each level in the stack and why they exist. We need to assume failure and plan for recovery. We need to start thinking of distributed systems as a UX problem.
- Backwards digital thinking
I had to read that a couple of times. Turcke’s daughter had used a paid service to get content and her parent was telling her that it was stealing. I am not sure what it is but it is not stealing. If you pay for something for the same amount as an equivalent human across the border, that is not stealing. The message Turcke was teaching her daughter and now was trying to tell all Canadians was that they were second class. They couldn’t get the content that others could get because companies like Bell Media had paid for them not to do so.
- Facebook and PGP
Can they find a way to make PGP easy to use? That encompasses a wide range of activities: composing encrypted and/or signed email, receiving it and immediately realizing its status, being able to search encrypted messages—and doing all this without undue mental effort. Even for sophisticated users, it's really easy to make operational mistakes with encrypted email, mistakes that gut the security.
- Backwards compatibility is (still) hard
The two features we really want from C# here are:
- Some way of asking the generated code to perform dynamic overload resolution at execution time… not based on dynamic values, but on the basis that the code we’re compiling against may have changed since we compiled. This resolution only needs to be performed once, on first execution (or class load, or whatever) as by the time we’re executing, everything is fixed (the parameter names and types, and the argument names and types). It could be efficient.
- Some way of forcing any call sites to use named arguments for any optional parameters. (Even though in our case all the parameters are optional, I can easily imagine a case where there are a few required parameters and then the optional ones. Using positional arguments for those required parameters is fine.)
- “Troldesh” – New Ransomware from Russia
By the end of our correspondence, I managed to get a discount of 50%. Perhaps if I had continued bargaining, I could have gotten an even bigger discount.
- States Seek Better Mousetrap to Stop Tax Refund Fraud
"On a high level, what we’ve determined as of this week is that — unless the lobbyists derail our efforts – we’re going to ask for different authentication measures on a new customer, and different on returning customer, and then we’re going to ask for whole bunch of data elements that we’re not getting now that will allow us to filter the returns on receipt and will allow us to put the returns in various buckets of scores for possible fraud."
For example, one telltale sign of a fraudulent return is one that takes the filer a very short time to fill out.
"If someone takes two minutes or less to fill out a tax return, that’s pretty much fraud 100 percent of the time, because they’re just cutting and pasting information from somewhere else," said Magee’s deputy Garrett. "So we said, okay, send us information about how long it takes them to fill out a return."
- Phony Tax Refunds: A Cash Cow for Everyone
When the money was deposited into the Sunrise account, TPG extracted three fees: $35 for handling the federal refund, $10 for state refunds and $10 fee for TurboTax (since thieves had used TurboTax to fraudulently file his request.
Another $2,000 from the refund was diverted to an Amazon gift card. For thieves, diverting some of the funds to Amazon hedges their bets in case somehow the prepaid card that receives the bulk of the funds gets canceled by authorities cracking down on tax return fraud. These gift cards also are easily resold for cash.
"For Amazon, it guarantees a flow of future purchases in the Amazon system, and potentially generates more profit as consumers often forget to use all the value on their gift cards," Garrett said.
- How FIFA Explains the World
Ever since it assumed its hegemonic role in the ashes of World War II, America has played the part of what Josef Joffe calls the “default power” in maintaining global peace and security. It’s why Israelis and Palestinians both trust America, and not, say, Russia or Saudi Arabia, as an “honest broker” in peace negotiations, why the Europeans long ago consented to American security dominance on their continent, and why nearly all Asian countries appeal to us in containing an expansionist China.
- I Sued the Grateful Dead
To my wonderment, a check for $16 arrived in the mail about a week later. No explanation; no apology; and obviously no need for Bill Graham’s lawyers to waste their time arguing with a disappointed hippie law student. That was not good enough for me, however, as there was still the matter of my $2.25 filing fee, which I was entitled by law to recover.