Today is a day of mass activism, as individuals around the country contact their representatives in DC to oppose dragnet surveillance. While I’ve emailed my representatives already, I’m sharing the content of my messages below the break as an open letter to raise awareness of these issues and the collective efforts taking place today through thedaywefightback.org.

Continue reading

Posted in Activism, Politics | Tagged , | Leave a comment

Don’t “confuse morality with frugality”

We look at overhead as a dirty word in charitable work, but investing in marketing and investing in the charity itself grows the overall funds available for, e.g., cancer research.

Example: I raise $100k from a generous donor. I could spend 100% of that on research or services for the poor and we’d be $100k closer to solving the problem. Or I could spend that $100k creating and marketing a fundraising event. So long as that event earns back at least that $100k, I’ve achieved the same ends AND raised awareness. If it garners more than $100k, then we’re actually gaining traction toward solving the problem.

Ideally, this example would go even further, say, doubling the money so that we could use $100k for charitable work AND have $100k to do the same thing again the next year.

Quotations from the TED talk which inspired this post. Originally posted to Google+.

Dan Pallotta‘s talk includes the line “[W]hen you prohibit failure, you kill innovation”. We need to pay attention to where resources are going, yes, but we also need to allow for the fact that it takes many failures to lead to success, as much in the non-profit world as the for-profit world.

Posted in Activism | Tagged , , , , , | Leave a comment

Ohio State Four-Miler

Today was the first ever Ohio State Four-Miler, raising money for general cancer research funds at the James. It was also my first (official) timed race.

Dave, still standing, after the Ohio State Four-Miler

I’m happy with my 9:36 minute mile pace and 38:23 overall. I fell out of the habit of running regularly after realising I’d be out of town for the Columbus (Half-)Marathon last month, but now that I’m looking forward to the Capital City Half-Marathon in May, I’ve got a good reason to get back into it :D

Posted in Activism, Days of my Life, Family and Friends | Leave a comment

Stop Watching Us

There’s a rally today in Washington, D.C. A rally against domestic surveillance. Today is the day the organisers deliver a half-a-million signatures to Congress demanding simply that Congress reveal to the US public the extent of the NSA’s domestic operations.

So far I tweeted @FoxNews, @CNN, and @MSNBC to ask about the lack of coverage of this story, and sent the following message to NBC News:

Dear NBC News,

I’m curious as to why your front page coverage online today is a story about the wedding gap instead of a story about the people’s reaction to the biggest domestic surveillance scandal in US history. Stop Watching US (https://rally.stopwatching.us/) is a rally happening today in Washington, DC, and getting no coverage from major media outlets. Why are you silent about government overreach and the public’s reaction? What happened to the Fourth Estate providing an additional check and balance to keep the government in line?

I am disappointed.

David M Howcroft

“This isn’t about right and left. This is about right and wrong,” said one of the rally participants during the livestream. While a soundbite risks oversimplifying the situation, this issue does raise questions about our deepest held principles. We are proud of our values and try to promote them worldwide, but if we don’t promote them domestically what’s the point? We’re sending the world a clear message that we don’t give a damn about the values we use as justification for our global interventions.

I didn’t mean for this to turn into a rant, but it’s fast heading that way. How can we unite against the challenges facing our nation when public discourse on these issues is limited to local discussions of national and global issues? National media aren’t covering this rally, and would rather talk about Snowden’s background and his girlfriend than about the fact that the government is violating our rights. Without media coverage to raise awareness of the issue and bring us the facts, how can the populace be informed when they vote this November and next November and November in three years’ time?

I’m frustrated. I’m disappointed. Please tell me if you have ideas as to how to improve this state of affairs.

Meanwhile, you can sign the Stop Watching Us petition yourself. Tell Congress to reveal the extent of domestic surveillance.

Posted in Activism, Politics | Tagged , , , | Leave a comment

Ensuring unique subjects on Mechanical Turk

I recently ran an experiment on Amazon’s Mechanical Turk (MTurk) to collect some naturalness judgements from native speakers of American English. We wanted to make sure that each subject could see only submit one survey, corresponding to one set of stimuli, but there didn’t seem to be a good way to do this with any of the default MTurk configurations. We had to either (1) post each set of stimuli as a different HIT (Human Intelligence Task) and use some kind of qualification system to make sure each subject could complete only one survey or (2) post all of the surveys as the same HIT and host the survey on our own server.

We chose option (2), which uses the HIT as a gatekeeper: each HIT can have any number of assignments, but a given Turker (worker on MTurk) can only complete one assignment for each HIT. The HIT we used used the ExternalQuestion format to grab a survey from our server when a Turker accepted an assignment. A script on our server was responsible for checking each request and providing either (a) the next available survey or (b) the survey that the Turker was originally assigned and now needs to complete.

The code I wrote to do this isn’t ready for public dissemination (I had to jump in manually to tweak things here and there during run-time; ew), but shoot me a message if you want more information. My ideal is to assemble a group of researcher willing to contribute to a public codebase on github so that we can make these research methods more accessible to our less technical colleagues and refocus our efforts on research rather than infrastructure.

Posted in Research | Tagged , , | Leave a comment

Boxes, blind men, and elephants

In his blogpost Putting people in boxes, John Cook describes the plight of the interdisciplinarian when seeking, for example, new employment. Academics must regularly deal with emboxment: we try to describe our research in ways that are both expressive of nuance and compatible with preconceived notions about what research is worthwhile. John’s examples are limited to instances where we’re trying to please a single gatekeeper, but the reality is that we’re often seeking to please multiple audiences: friends & family, peers & colleagues, university administration, just to name the most common.

I don’t think this is a bad thing. We might think that compressing our research into a few pages for a grant proposal, or a list of papers for an adminstrator’s review, or an accessible blurb for our loved ones is a bit like introducing blind men to an elephant, but this story is one of the better versions of that tale. In our telling, we are guiding the blind men to discover the parts they are most interested in. This helps them get the most from their encounter with the creature and likely shows us something new about it as well. Retelling the saga of our research helps keep us focused and reminds us of the larger context we’re working within. And all by narrowing the scope of the description. Interesting how that works, isn’t it?

Posted in Research | Leave a comment

Short Note: “Aw, Snap!” in Google Chrome on Twitter, Google Calendar, etc, (in Linux)

Running Fedora 15 I had this problem in October or so of last year where Chrome crashed anytime I went to Twitter or Google Calendar (among other sites). I somehow resolved the problem, but couldn’t remember how when it resurfaced after my upgrade to Fedora 16. Somewhat interestingly (maybe?), I don’t think the problem arose immediately after the upgrade, but may have only come up after I updated Chrome as well.

Anyway, I went through all of Google’s recommended procedures for dealing with the “Aw, Snap!” page, to no avail. After trying a number of other suggestions, I had given up and figured I’d just use Firefox for these sites. A few days (weeks?) later, I decided to give it a few more minutes of my time and found a site with a useful answer.

So this is the solution that worked for me:

restorecon -R ~/.config


Posted in Uncategorized | Leave a comment

Ranking Sentences, and doing it *right*

For a project I’m working on, I’ve got a collection of paired sentences where one is classified as more complex than the other. I’ve got some thoughts about interesting features to use in the context of measuring sentence complexity, but before delving into those I’m trying to do a sanity check: if really simple things (e.g. word length, sentence length) can reliably (i.e. > 80 or 90 % of the time) indicate whether a sentence is simple or complex, then there’s less to be gained by using the more involved measures I’m interested in.

Anyway, I started out treating this as a classification problem using NLTK‘s NaiveBayesClassifier and MaxentClassifier and labelling each sentence either simple or complex. This was pretty dismal at single-sentence classification (getting ~61-63%) but did alright on larger chunks (~91-93% for texts consisting of 25-30 sentences). But my dataset consists of paired sentences and I’m interested in applications where the question I’m asking is, “which of these two sentences is more complex?” I want to place them on a scale, if you will.

To that end, I started using SVMrank and felt I was getting pretty good results. Just a one percent improvement (about 300 sentence pairs in this dataset), but I was using a somewhat hackish imitation of the feature I really wanted to add, so 1% was a promising start. Upon further reflection, however, I couldn’t recall exactly what parameters I’d passed to what runs of SVMrank, so I methodically reran my tests and, lo and behold, the difference disappeared: my 1% gain shrunk to 0.1%. In the course of messing around with things, I also discovered that there was a 20% drop in accuracy when I trained on more data, which sounds like overfitting, but wasn’t a problem I anticipated with SVMs.

So now I’m back to sanity checking using NLTK and examining paired sentences where I have either the complex sentence first or the complex sentence second. In classifying these samples as “complex left” or “complex right”, the NB classifier is getting around 71% using the most basic features (the ones that were giving 61% on single sentence classification as simple or complex). Given my recent experiences with SVMrank, though, I’m hesitant to assume this means what I’d like it to mean and move on to the next problem.

Any thoughts on SVMrank and why I might be seeing overfitting-like behaviour? Any recommended tutorials for working through this kind of ranking problem? Comment below!

Posted in Research | Tagged , , | Leave a comment

To the Senator’s Office

A couple of weeks ago, as I was investigating PIPA and SOPA online, I found myself on americancensorship.org, clicking through to the page for Ohio. There I found contact information for my Senators and a message board to coordinate with fellow Ohioans under the banner of Ohio for Internet Freedom. Over the course of the subsequent weeks this hodgepodge group of folks seeking to do something about P.R.O.T.E.C.T. IP managed to arrange a meeting with Senator Sherrod Brown‘s Deputy State Director, Beth Thames. This is the story of our meeting with her yesterday morning.

I arrived in Cleveland at around 9am and found the Senator’s office without a hitch. By about 20 to ten my fellow organizers, Matt Stafford and his father, had arrived and we chatted amongst ourselves until Beth came out to meet us. She greeted us with a smile and put us at ease as she welcomed us into her office to meet around a small table to discuss our concerns with the P.R.O.T.E.C.T. IP Act.

Matt suggested that I take the lead in our discussion, and so I opened by asking Beth for more details regarding the Senator’s position on the bill: why does he support it? what aspects of it does he believe will do good? While she could not officially comment for the Senator as to his position, she did say, as a member of his staff for fifteen years, that she believes his interest in the bill is at least in part driven by a desire to protect consumers. Noting that this was not an exhaustive representation of his support for the bill, she mentioned that he wanted to fight, among other things, the availability of counterfeit prescription drugs in the United States. I responded that, if his concern was counterfeit drugs, a more targeted bill might help him achieve those goals, and I was reminded that this was not his only reason for supporting the bill.

From there I launched into my argumentation against P.R.O.T.E.C.T. IP, running along two major themes: the bill is anti-competition and grants the government censorship powers. To explain the former, I cited the MPAA and RIAA as examples of the bill’s supporters. These industry groups represent what was, once upon a time, the new media: when recording came of age last century, it supplanted an entertainment industry driven by live performance. Now that these industries have established themselves, they don’t want to have to compete with the alternative distribution models that have become popular in the last decade: they don’t want to have to innovate. There was a bit more to flesh out the argument than that, but that’s the crux of it. I did not mention at the time the oft-cited figures of soaring profits in these industries despite (or, perhaps, caused by?) piracy and advertising-based distribution of their media for “free”.

After some discussion I went into the censorship component of the bill: we do not want to grant the government, let alone private companies, the ability to censor the internet. Even with discussion in the past week of dropping the DNS provisions, the bill grants the power to cut off search engine leads and advertising revenue, which is plenty to kill a website.

The web today is driven by user-generated content. This is what has spurred the internet’s growth in the past decade. But P.R.O.T.E.C.T. IP disincentivizes entrepreneurship. If I want to start a new site based on user-generated content, I have to hire dozens of additional people to moderate user comments or develop technology to automate this process.  If I choose not to do this, I risk having users’ comments get my site blacklisted. Even if my domain name still resolves, losing search engine traffic, losing advertising revenue could put me, and all my employees, out of a job in a heartbeat. The business climate PIPA creates would stifle innovation and entrepreneurship on the web.

Following this initial line of argumentation, I invited Matt to share his views as well. Bringing up the DNS provisions, he pointed out that the bill runs counter to DNSSEC, a standard  heretofore supported by the US government and designed to secure the internet against high-level hacking. Beyond the DNS component, we brought up the effect this has on consumers. For the technically savvy, there will be workarounds (we already have Tor for getting around the Great Firewall of China). For the less technically savvy, it’s likely that hacked versions of these workarounds will proliferate: if they’re not careful in what they download, they may well get a virus with their censorship-workaround software. For the still less technically savvy, these things won’t even be an option: their experience on the internet will simply never be the same.

This effect spills over into business as well. The United States of America is still a world-leader in online technologies. Innovations in our tech sector drive the industry forward. But if we start burdening search engines and advertisers with these additional responsibilities, we could see them start moving abroad. “Easiest way around PIPA-firewalls? Use google.fr!” or some other service hosted elsewhere. Until the government starts blocking them too.

At a few points throughout our meeting, we did joke with Beth that this would put us in the same league as China and Iran. This led us to acknowledge the political rhetoric just waiting to be let loose: these bills are fundamentally un-American—stifling the free-market and bringing about government censorship. Connecting this theme to the threat of businesses leaving the US, we discussed Google’s withdrawal from mainland China following years of attempts to compromise their ethics sufficiently to operate in the People’s Republic.

Matt also brought up his job at the Examiner, and the prospect of lost income not only for businesses but for individuals who freelance. The effects of the bill are far-reaching and would hinder his ability to contribute to the local economy of Cleveland.  He also brought up the government-condoned vigilante justice portion of the bill. This portion, Section 5, encourages ISPs to violate our privacy and investigate the contents of our data transfers by saying that they shall not be held accountable for actions taken “in good faith” to serve the purposes of PIPA. This led me to suggest that the Senator should support a bill for net neutrality, making the ISPs utilities providers who should provide “dumb pipes” to transfer our data, regardless of the content.

At the end of the meeting, Beth asked us if there was anything in this bill that we thought could be saved. I said yes. What can be saved is the spirit of support for law enforcement. I see this bill as a cry for help from law enforcement and the DOJ as much as a concerted anti-competition effort on the part of certain industries. But we can support law enforcement with additional resources and staff without new legislation.

I did also say that incorporating more due process in the bill might make it more palatable, though even with a trial in place rather than the mere request of the Attorney General or a rights-holder this amounts to a censorship bill. I stressed this final point heavily: the bill is fundamentally a bill to censor the internet, and that is entirely unacceptable.

Deputy State Director Beth Thames thanked us for sharing our concerns. Having taken our contact information at the beginning of the meeting, she let us know that the Senator’s office would be in touch sometime soon with their response to our meeting. I look forward to receipt of this information and will share what I can at that time.

Thanks for taking an interest in protecting our freedoms online, thanks for reading my tale. Please share your thoughts on the meeting, questions, comments, your own arguments for or against this sort of legislation. Let’s continue the dialogue that makes our nation great!

Posted in Activism, Politics | Tagged , , , , , | 1 Comment

A (Potentially) Essential Productivity Tool for Grad Students

Today is Saturday, so naturally I have spent several hours procrastinating on the web: watching TV series and YouTube series on YouTube, looking at job listings on Linguist List, visiting TVTropes for nearly the first time, etc.

Then I got to searching for some Chrome extensions. In particular, I was hoping to find an extension to block LSOs (Local Shared Objects), which are like special cookies built into Adobe’s Flash. The main difference is that they can store something like 100kB of data and are much more difficult to delete. (Here’s an article from How-To Geek with more details on getting rid of LSOs.)

This led me to discover StayFocused, an application for helping with productivity in just the way that grad students need most: blocking sites that suck up all your time. I’ve only spent a few minutes tinkering with it, but it’s very customizable. You can add blocked and allowable sites and set daily limits on how much time you should be allowed on those sites. You can also toggle days of the week and times of day you want the blocking to run. Of particular interest is the Nuclear Option, which lets you block everything that’s not on your allowed list for however many hours you’d like–and it can’t be undone! (There’s also an option to even block your allowed list.) There’s also a Require a Challenge option that makes it take a bit longer to change your options, helping you resist the temptation of changing your settings whenever you want a bigger fix.

I’ve set mine up to limit how much time I spend on YouTube and Facebook for now, and might add, say, this blog and Language Log and other sites to the list as well. For my allowed list, I have only gmail, ohiolink, and OSU-based websites. I’ll keep you posted on how well this works and whether or not my productivity improves as a result.

If you have any suggestions for similar add-ons or sites that help keep you focused, share them below! I’m particularly interested in similar extensions for Firefox and your experiences with these types of tools.

Posted in Uncategorized | Tagged , , , , | Leave a comment