a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment by thundara
thundara  ·  4644 days ago  ·  link  ·    ·  parent  ·  post: Massive non-parallel core programming
PyPy's been working on this with its software transaction memory research. I'm curious whether this will result in analyzing the probabilistic chance of inconsistent writes/reads in code the development of algorithms that minimize them.

Edit: I should note, though, that STM still results in some determinism in code, it just causes sections to be re-run when they notice that their memory was changed underneath them.

Edit#2: And he addresses the idea of probabilistic algorithms, too, cool! Neat article!





cliffelam  ·  4644 days ago  ·  link  ·  
Very cool.

I've been saying for a while that we treat every problem as if it were a banking problem - ACID, etc, etc. And that most problems are mostly unlike banking. Glad to see this sort of discussion.

-XC

thundara  ·  4644 days ago  ·  link  ·  
Well, for a lot of people, the mental shift towards non-deterministic code can be a bit troublesome, I've seen plenty programs that failed because they didn't bother to use locks and tried to poke a bit of memory after it had been freed by another thread.

On the other hand, there are problems that are inherently unsolvable with locks because they just aren't available, like network programming.

Other problems are difficult to imagine without some sort of central synchronization. If you're running reddit, how do you store comments/replies without using one central database? What if someone replies to a comment that was stored on server X and server Y receives that request before server X tells it about the first comment?

Should it error? Or just store the comment in that inconsistent state and worry about resolving/cleaning up dangling references later?

cliffelam  ·  4644 days ago  ·  link  ·  
Well, not to slam Reddit on hubski.... But who cares? Seriously, if I miss a comment or two comments get out of order, but the code is 10x faster and easier to maintain, let's put that in the win column and move on.

Google results are different from different data centers at the exact same moment because of code and crawler propagation. Seems to work fine for everyone.

Most software communicates with TCP instead of UDP and that seems to work fine.

-XC

thundara  ·  4644 days ago  ·  link  ·  
Less about reddit than just a generic imagined scenario of running a large, distributed message-board system and how you just have to think a bit more and be aware of the trade-offs that you can (And probably should) make.

Also, I wasn't referring to TCP/UDP, rather large networks of computers, where things regularly break and it's hard to get the atomic synchronization that locks give without throwing performance out the window. Algorithms like mapreduce are built to mitigate the fact a fraction of servers in a cluster will regularly fail/stall.