Category Archives: CS / Science / Tech / Coding

Command line history

Jude Robinson :: The single most useful thing in bash

Create ~/.inputrc and fill it with this:

"\e[A": history-search-backward
"\e[B": history-search-forward
set show-all-if-ambiguous on
set completion-ignore-case on

This allows you to search through your history using the up and down arrows … i.e. type cd / and press the up arrow and you'll search through everything in your history that starts with cd /.

Wow. That is not an exaggeration at all: the most useful thing. I am so thrilled to finally be able to search my shell history the same way I can my Matlab history. I've been able to do this there for ages and my mind still hasn't caught up with not being able to do it in the shell.

If it's not clear to you why this is useful or why it pleases me, I don't think there's anything I can do to explain it. Sorry.


PS Anyone have first-hand experience with the fish shell? The autocompletions and inline, automatic syntax highlighting seem clever. I need to get around to giving it a try on one of my boxes.

Leave a comment

Pi

The Economist :: Babbage Blog :: Humble Pi

The Raspberry Pi is the brainchild of a couple of computer scientists at Cambridge University. Back in 2006, they lamented the decline in programming skills among applicants for computer-science courses. ... Over the past ten years, computer-science students have gone from arriving at university with a knowledge of several assembly and high-level programming languages to having little more than a working knowledge of HTML, Javascript and perhaps PHP—simple tools for constructing web sites. To learn a computer language, “you’ve got to put in your 10,000 hours,” says Dr Upton. “And it’s a lot easier if you start when you’re 18.” Some would say it is even better to start at 14.

The problem is not a lack of interest, but the lack of cheap, programmable hardware for teenagers to cut their teeth on. For typical youngsters, computers have become too complicated, too difficult to open (laptops especially) and alter their settings, and way too expensive to tinker with and risk voiding their warranty by frying their innards.

I don't see the connection between learning to code and having super-cheap hardware. Back when I was a kid learning to program I actually had to pay real money for a compiler. (Anyone else remember Borland Turbo C++?) Now you're tripping over free languages and environments to use, including many that run entirely through your browser so there's zero risk to your machine.

Honestly how many teens are going to go full-David Lightman and be doing serious enough hacking that their hardware is at risk? Is the goal to make sure teens have the opportunity to start learning to code before college, or to give them hardware to tinker with? Those are both fine goals. Being a software guy I'd put more weight on the former, but the important thing is that the way to accomplish either are completely different.

The Pi is a great way to meet the goal of giving people cheap hardware to experiment with. But if the goal is to give kids an opportunity to start racking up their 10k hours in front of an interpeter or compiler then projects like Repl.it are a lot better. (Repl.it has in-browser interpreters for JavaScript, Ruby, Python, Scheme and a dozen other languages.)

For starters, [your correspondant] plans to turn his existing Raspberry Pi into a media centre. By all accounts, Raspbmc—a Linux-based operating system derived from the XBox game-player’s media centre—is a pretty powerful media player. The first task, then, is to rig the Raspberry Pi up so it can pluck video off the internet, via a nearby WiFi router, and stream it direct to a TV in the living room. Finding out not whether, but just how well, this minimalist little computer manages such a feat will be all part of the fun.

I did this exact project about a month ago, and couldn't be more pleased with either the results or the fun of getting it to work. I still have to tinker with some things: the Vimeo plugin won't log into my account, and I need to build a case. Other than that, I wish I had done this a long time ago.

Leave a comment

Ruby

Coding Horror :: Jeff Atwood :: Why Ruby?

I've always been a little intrigued by Ruby, mostly because of the absolutely gushing praise Steve Yegge had for the language way back in 2006. I've never forgotten this.

For the most part, Ruby took Perl's string processing and Unix integration as-is, meaning the syntax is identical, and so right there, before anything else happens, you already have the Best of Perl. And that's a great start, especially if you don't take the Rest of Perl. But then Matz took the best of list processing from Lisp, and the best of OO from Smalltalk and other languages, and the best of iterators from CLU, and pretty much the best of everything from everyone.

Amen. My Ruby still isn't as idiomatic as I would like it to be, but even with my intermediate level I'm so pleased how it gets out of the way lets me do what I want no matter what paradigm I'm thinking in at the time.

Atwood has this to say himself, which also gets my thumbs-up:

Ruby isn't cool any more. Yeah, you heard me. It's not cool to write Ruby code any more. All the cool people moved on to slinging Scala and Node.js years ago. Our project isn't cool, it's just a bunch of boring old Ruby code. Personally, I'm thrilled that Ruby is now mature enough that the community no longer needs to bother with the pretense of being the coolest kid on the block. That means the rest of us who just like to Get Shit Done can roll up our sleeves and focus on the mission of building stuff with our peers rather than frantically running around trying to suss out the next shiny thing.

I suppose it's in the nature of craftsmen to get a bit giddy about their gear, but at a certain point enough is enough. (This lead to some frustration when I was learning photography a couple of years ago; photography amateurs seem especially prone to gear obsession.) Yeah, definitely choose the right tool for the job, but don't forget the point is the job, not the tool.

I've endured far too many conversations in the break room, and at barbecues, and at happy hour about the supremacy of various languages.

I don't care which languages people are going to be drooling over at SIGPLAN this year.

I don't care what static analysis theorems you can prove using which language.

I don't care how cool your tools are.

I care how cool your results are.

Leave a comment

"Social networks in fact and fiction"

The Endeavour :: John Cook :: Social networks in fact and fiction

In a nutshell, the authors hope to get some insight into whether a myth is based on fact by seeing whether the social network of characters in the myth looks more like a real social network or like the social network in a work of deliberate fiction. For instance, the social networks of the Iliad and Beowulf look more like actual social networks than does the social network of Harry Potter. Real social networks follow a power law distribution more closely than do social networks in works of fiction.

If you read one version of Beowulf and it's not Seamus Heaney's translation you better have a very good reason.
If you read one version of Beowulf and it's not Seamus Heaney's translation you better have a very good reason.
Cool.

I vaguely remember reading about some astronomer's using Homer's descriptions of constellations to pin down dates for various events in the Odyssey. Perhaps it was this PNAS paper by Baikouzis & Magnasco?

It seems however that an accurate historical account might have a suspicious social network, not because the events in it were made up but because they were filtered according to what the historian thought was important.

Indeed. Although I suspect this form of Narrative Bias would be less of a problem with Beowulf, the Illiad, etc., because the composers of those tales, and their audiences, had less exposure to the way fiction is "supposed" to be.

I would like to see someone do similar analysis for contemporary non-fiction. I prefer authors who tell a sprawling, tangled, less narratively driven story (Keay, Mann, Lewis, and Mukherjee come to mind) to one that fits a more conventional structure.  It's difficult for me to accept a story that takes place in a high causal density environment and yet somehow only a couple of people have any agency.

Nate Silver apparently feels the same way:

When we construct these stories, we can lose the ability to think about the evidence critically. Elections typically present compelling narratives. Whatever you thought about the politics of Barack Obama or Sarah Palin or John McCain or Hillary Clinton in 2008, they had persuassive life stories: reported books on the campaign, like Game Change, read like tightly bestselling novels.
— The Signal and the Noise, p. 59

Those are not the books I'm interested in.


NLP is very much not my area, but I can't help but wonder about automatically generating some sort of metric about this for a book. Count up how many proper names appear, and build a graph of them based on how closely together they are linked in the text. (That is, two names appearing often in the same sentence are more closely linked than two which appear in consecutive paragraph.) Perhaps simply looking at the histogram of name occurrence frequency might give you some preliminary ability to separate books into "realistic social structures" and "fiction-like social structures."

Leave a comment

Conway's Game Of Life in APL

APL is the sort of thing I imagine "Programmer-at-Arms" Pham Nuwen used in Vernor Vinge's "Zones of Thought" books. It seems both incomprehensibly magical while also impeccably rational, forming the shortest bridge between what the programmer is thinking and what the computer is calculating.

Of course it also seems like you'd be building a bridge no one else could walk over. Not even you-one-year-from-now. I have a hard enough time figuring out what I was thinking when I read Ruby I wrote a year ago.


For more, here's a written tutorial by the same fellow. It had this helpful poscript regarding "glider guns"

Artillery in a finite manifold is a dangerous business.

Hehe. Good advice, I suppose.

Aficionados of the 1979 arcade game "Asteroids", which took place on a torus, will recall that the manufacturers thoughtfully gave their canon a range just short of the screen's circumferences.

Found that out the hard way back in undergrad when we wrote a genetic algorithm which learned to play Asteroids.

Leave a comment

Manfred Mohr

I saw this post on Manfred Mohr's (very!) early digital art literally minutes after picking up a book about Mohr's work from the library yesterday. Total coincidence.

p231, Manfred Mohr
p231, Manfred Mohr

I like a lot of the pioneering digital artists' work. I think the restraints gave them a lot of focus. (Limits boost creativity.) You don't see many of those first generation guys still producing like Mohr does though. It's also refreshingly unusual that Mohr hasn't gotten bogged down in self-consciously retro, 8-bit, glitch kitsch.

(Dear [far too many digital artists]: I also remember that sprite animations were once a cutting-edge thing. Move on.)

Back when I was taking Computational Geometry I remember getting sidelined working out an algorithm based on the six dimensional rotations in Mohr's space.color.motion. At some point I need to dig my notes out for that and implement it.

If you like Mohr you may want to check out this interview as well. I like what he says at the beginning about taking a simple idea and running with it. Also the questions about an artist working like a scientist were good. This also stood out:

So I went higher and higher in dimension and it got more and more complicated. I could do different things, but I could not explain to people because it was so complicated. '6-D cube, what kind of crap are you talking about?'

I've had that experience. Here's a free tip for you: do not try to explain something to a room full of psychologists by beginning "So if you were to imagine yourself moving around on the surface of a 35-dimensional hypercube..."

This interview also displays the link between algorithmic art and political economy. Mohr sets up a system, and lets it do its thing. The artwork emerges from the software. Legislators and regulators and executives set up a system and it does its thing. The economy emerges from society. Mohr's work, like all good algorithmic art, reminds me of Hayek's description of spontaneous order: "of human intent but not of human design."


PS Prosthetic Knowledge points out that Mohr also maintains a very thorough YouTube channel. Would that all digital artists did similar.

Leave a comment

Reader

Google is shutting down the Google Reader service, as you may have heard.

I am not happy about this, but I'm also not throwing fits about it. I spent more time in my Reader tab than any other by a huge margin. I'm sure there's some other service out there that will fill the void. Google is also being pretty considerate about this: they're giving users several months warning and they've made tools available to export your subscription data.

RWCG has had a couple of posts taking the wind out the sails of all the people who are tearing out their hair and rending their garments and casting the evil eye towards Mountain View. Overall I think he's on the right track, but this post doesn't quite line up for me:

RWCG | Crimson Reach | Google Reader: The end of the ‘free’ era

How dare Google shut down a service I made heavy use of for years and years and paid nothing whatsoever for! [...] I wonder if this event marks the end of the ‘free’ era. You know the free era, it’s the era where this was the prevailing business philosophy:

1. Drive competing services out of business with a free service (subsidized by a profitable product).
2. Cancel free service.
3. ???

Actually no, that’s still snarky. It’s more like this:

1. Company gives away something for free.
2. People like it and use it.
3. This makes people think the company is cool. Their friend.
4. When it goes public, they buy its stock, cuz it’s so cool, and everyone they know uses and likes it. Surely that’s gotta mean something, stock-wise.
5. People continue to use the free thing and come to not only rely on it but expect it as their birthright.
6. ...
7. Profit?

According to this philosophy, giving away cool stuff for free was the wave of the future. It’s what all smart, cool companies did. Only backward knuckle-dragging idiots couldn’t figure out how this added up to a business model. Economic realities were no longer reality.

I think he's short-changing the business potential of a product ("product"?) like Google Reader. There's no direct line between "make Reader & give it away free" and "profit," but this approach still has some uses.

1. Yes, it makes people think you're cool and friendly and not-evil. Many firms do things for that reason alone. They spend billions on things way outside their core competencies just so people think they're cool and friendly. Isn't that the entire point of the "Corporate Social Responsibility" fad?

Google paying its employees to create Reader is in it's wheelhouse; it makes sense. Far more sense than, for example, Chrysler paying its employees to lay bathroom tile in poor neighborhoods.

2. Providing services like Reader makes Google look cool to people generally, but more importantly it makes them look cool to geeks. It's a punchline that a firm's number one asset is it's people, but that's pretty true about Google. They can do what they do because they get the pick of the litter of hackers.

I went to career fair my CS department sponsored a month or so ago. The line for the Google table was literally out the door. Most people in my graduate program are angling for academic jobs. Google is one of maybe four private companies that people will be impressed you're interviewing with.

3. Projects like Reader not only motivate applicants, they motivate employees.

Talent and productivity are extremely unevenly distributed in coders. The best are many orders of magnitude better than the median; the bottom decile (conservatively) have negative productivity. You usually don't get the best by offering them orders of magnitude more money, you get them by giving them cool problems to work on.

If you're excited about spending some time developing X, there's a good chance Google will let you do that. (At least in comparison to if you were working at Initech.) What's more, there's a chance Google will roll out X to millions of people, like they did with Reader. I can't stress enough how big of a motivator that can be.

4. Google's strategy for a while has been that anything they do to make people want to use the internet more is good for them, because they capture a dominant slice of the ad revenue online. More people spending more time online is better for Google, period. Reader fits into that. That's not a strategy that will work forever, or for many (any?) other companies. It can also be used to justify a lot of wasted "investments." But it's also true.

5. Google lives and dies off of data. Reader could have been generating that for them. I have no idea how much they actually learned from people's reading habits, if anything, but it had the potential to be a goldmine.

If you can predict people's age, race, religion, political party and drug use only using publicly available "likes" on Facebook, what could you do with my online reading habits? (Answer: sooooo much.)


I have no idea if it was a good idea or a bad one for Google to shut off Reader. I'm skeptical, but I realize I have none of the facts. I can't imagine it would cost that much to keep it running as is, especially compared to their other projects. I'm not sure what better use they have for the resources they're redeploying. I'm curious that they didn't even try to make it ad supported. Hell, I would have even paid directly to keep using it, and I pay for approximately zero online subscriptions.

Again, I don't know what they know. But I do know that "there's no revenue from this free product; let's shut it down" should not be the beginning and end of this decision making.

Leave a comment

Computational Cartography & Gerrymandering

My brief recent dabbling with computational cartography (see Part 1 & Part 2) was inspired by Neil Freeman. His project was initiated as a (speculative) reform to the Electoral College.


Incidentally, this is a great idea. Every four years when whichever party looses the White House starts complaining about how silly the Electoral College is I feel compelled to point out that it's actually an excellent way to maximize the expected impact of any one voter's vote. The catch is that this requires states to have equally-weighted populations, which obviously we don't.

I just ran a quick Monte Carlo to check the numbers on this. Let's say you voted in 10 million elections in your life, and in each one there were 100 million voters. (This is obviously way too many elections, but I wanted something that didn't require me to report the results in scientific notation.) Let's further stipulate that every one of those races was close: a dead heat with a standard deviation of 1%. In this idealized case you could expect to be the tie-breaking vote — thus having your vote "matter" — in less than four of them. Four in ten million.

cnt = 10000000;
pop = 100000000;
for i=1:1000,
  results = round(random('normal',n/2,pop*.01,[cnt 1]));
  ties=[ties;sum(results==n/2)];
end;
mean(ties)     => ans = 3.9160

boston-red-sox-world-series-trophy

If, on the other hand, there were 51 states of 1,960,784 voters each, you would be the tie-breaker for your state 203 times. This would only matter if the other 50 states were tied 25/25. You can expect this to happen 11.2% of the time. Which means your chance of being the deciding vote goes from 4-in-10,000,000 to 23-in-10,000,000 which means your vote "matters" more than five times as much in the (modified) Electoral College system than it does in the winner-take-all system.

All of this is far too numerical to explain to someone in casual conversation, so my standard defense of the Electoral College is to agree that we should abolish it as soon as the winner of the World Series is decided by whichever teams scores the most runs in 63 innings rather than the team which wins four of seven games. The reasons to prefer both the Electoral College and best-of-seven champions are the same, but I'm already way too far off track to cover that now.


Okay, back to the point. Equally-sized states would be cool. But the chances of us redrawing state boundaries is zero.

Despite this, I think an automated system to draw boundaries for equally populated regions is far more than just an exercise. Why? Because while our state boundaries are effectively carved in stone by tradition — or in the case of my home of Maryland, literally carved in stone — there are political borders that are re-drawn with regularity: congressional districts.

That's how we end up with absurdity like this:

Maryland 3rd
Maryland's (totally reasonable, not at all farcical) 3rd Congressional District

Gerrymandering was one of the chief complaints than Glenn Reynolds made in his recent EconTalk appearance:

I'm also pretty much agnostic on term limits. I used to be opposed to them. And as I see how effective gerrymandering is, which I think is one of the reasons people hate Congress but love their Congressmen, or at least tolerate their Congressmen, because of gerrymandering, I think that's something that is difficult to deal with and probably wouldn't be helped that much by term limits. If you've got a gerrymandered district, especially with modern gerrymandering technology, the new guy who gets elected in it is likely to look an awful lot like the old guy because the district is so one-dimensional.

Taking the "especially with modern gerrymandering technology" as a given, how can we use technology to pre-empt this trend? It's usually best to address technical problems with technical solutions rather than social ones.

One option is to draw the boundaries the same way we do now, but reject them if their score on some measure falls below a certain acceptable level. For example, IDV Solutions has ranked districts by perimeter/sqrt(area). This is, by their own admission, a crude way to do things. To name just one problem, it favors districts that fall on state borders like the Mason-Dixon Line over those that fall on borders like the Potomac River.

Other measures are possible, obviously. One that I think would be useful is based on nearest neighbors. In a non-gerrymandered map the vast majority of addresses will have all of their k nearest neighbors in the same district. As districts become more complex, more people will be have large proportions of their neighbors in other districts. If you limit the neighbors to those within the state, which is reasonable, you minimize the Mason-Dixon/Potomac problem.

Another measure, which I've thought out less fully, would be to measure the Kolmogorov Complexity of the map. The descriptive length of the Maryland 3rd is... well "long" doesn't cut it. Compare that to the former definition of the Maryland 1st: "the Eastern Shore up to the Susquehanna River." The Kolmogorov Complexity itself is incomputable, but I think we have very good proxies in this case. Whoever draws the maps must be able to present in writing, with no graphics, an unambiguous definition of the districts in less than k words. Or alternately, must be able to present them orally to a judge, spoken and without visual aids, in c minutes.

There's a general principle here: If you can't explain it succinctly, it's a bad rule.

Another approach might be to measure the curvature or the fractal dimension of boundaries between districts. I've thought even less about this though.

The approach of capping some metric has the option of being simple, objective, and easily implementable. Whatever partisan committee which now draws the borders presents their map to an adjudicator, who passes it through an established, preferably open-source, program which calculates the metrics. If they're above some pre-established threshold the map is rejected. This is already an improvement over the subjective way we do things now.

A second option, suggested by David Friedman, is to have the partisan committee submit not a finished map, but a program to generate a map. The inputs to the program could contain population data, municipal boundaries, etc. but no information about party affiliation or past voting records, or any proxies for them. There would also be a cap on program length to keep anyone from submitting what is essentially just a precomputed lookup table.

Given the low numeracy I see in most legislators, I think it would be a hard sell to get them produce such a program, but the idea has tremendous appeal to me. The whole problem is to remove subjectivity and make the decision on the basis only of explicitly acceptable criteria, which is exactly what computers do: they're entirely dispassionate and the only know what you tell them.

We could take this one step further and simply codify the program to be used rather than having partisan committees writing their own after every census. I find a technocratic Bureau of Districting Algorithms much easier to imagine than local politicians wrestling with GIS algorithms.

Leave a comment

Armen Alchian & Unnecessary Mathematical Fireworks

Cato Daily Podcast :: Remembering Armen Alchian

Don Boudreaux discussing Armen Alchian's preference for clear prose over "mathematical pyrotechnics" reminded me of a few neural networks researchers I know. I won't name names, because it wasn't a favorable comparison. There's far too much equation-based whizz-bangery going on in some papers.

I use to think the problem was insufficient sophistication in my own math background, but I've recently heard independently from two very smart people in our Applied Math/Scientific Computing program that they also find the math in a lot of these papers to be more of an obfuscating smoke screen than a clarifying explication. If they find it hard to follow I've got good reason to believe the problem isn't just me.

Leave a comment