Ensuring unique subjects on Mechanical Turk

I recently ran an experiment on Amazon’s Mechanical Turk (MTurk) to collect some naturalness judgements from native speakers of American English. We wanted to make sure that each subject could see only submit one survey, corresponding to one set of stimuli, but there didn’t seem to be a good way to do this with any of the default MTurk configurations. We had to either (1) post each set of stimuli as a different HIT (Human Intelligence Task) and use some kind of qualification system to make sure each subject could complete only one survey or (2) post all of the surveys as the same HIT and host the survey on our own server.

We chose option (2), which uses the HIT as a gatekeeper: each HIT can have any number of assignments, but a given Turker (worker on MTurk) can only complete one assignment for each HIT. The HIT we used used the ExternalQuestion format to grab a survey from our server when a Turker accepted an assignment. A script on our server was responsible for checking each request and providing either (a) the next available survey or (b) the survey that the Turker was originally assigned and now needs to complete.

The code I wrote to do this isn’t ready for public dissemination (I had to jump in manually to tweak things here and there during run-time; ew), but shoot me a message if you want more information. My ideal is to assemble a group of researcher willing to contribute to a public codebase on github so that we can make these research methods more accessible to our less technical colleagues and refocus our efforts on research rather than infrastructure.

Posted in Research | Tagged , , | Leave a comment

Boxes, blind men, and elephants

In his blogpost Putting people in boxes, John Cook describes the plight of the interdisciplinarian when seeking, for example, new employment. Academics must regularly deal with emboxment: we try to describe our research in ways that are both expressive of nuance and compatible with preconceived notions about what research is worthwhile. John’s examples are limited to instances where we’re trying to please a single gatekeeper, but the reality is that we’re often seeking to please multiple audiences: friends & family, peers & colleagues, university administration, just to name the most common.

I don’t think this is a bad thing. We might think that compressing our research into a few pages for a grant proposal, or a list of papers for an adminstrator’s review, or an accessible blurb for our loved ones is a bit like introducing blind men to an elephant, but this story is one of the better versions of that tale. In our telling, we are guiding the blind men to discover the parts they are most interested in. This helps them get the most from their encounter with the creature and likely shows us something new about it as well. Retelling the saga of our research helps keep us focused and reminds us of the larger context we’re working within. And all by narrowing the scope of the description. Interesting how that works, isn’t it?

Posted in Research | Leave a comment

Short Note: “Aw, Snap!” in Google Chrome on Twitter, Google Calendar, etc, (in Linux)

Running Fedora 15 I had this problem in October or so of last year where Chrome crashed anytime I went to Twitter or Google Calendar (among other sites). I somehow resolved the problem, but couldn’t remember how when it resurfaced after my upgrade to Fedora 16. Somewhat interestingly (maybe?), I don’t think the problem arose immediately after the upgrade, but may have only come up after I updated Chrome as well.

Anyway, I went through all of Google’s recommended procedures for dealing with the “Aw, Snap!” page, to no avail. After trying a number of other suggestions, I had given up and figured I’d just use Firefox for these sites. A few days (weeks?) later, I decided to give it a few more minutes of my time and found a site with a useful answer.

So this is the solution that worked for me:

restorecon -R ~/.config

 

Posted in Uncategorized | Leave a comment

Ranking Sentences, and doing it *right*

For a project I’m working on, I’ve got a collection of paired sentences where one is classified as more complex than the other. I’ve got some thoughts about interesting features to use in the context of measuring sentence complexity, but before delving into those I’m trying to do a sanity check: if really simple things (e.g. word length, sentence length) can reliably (i.e. > 80 or 90 % of the time) indicate whether a sentence is simple or complex, then there’s less to be gained by using the more involved measures I’m interested in.

Anyway, I started out treating this as a classification problem using NLTK‘s NaiveBayesClassifier and MaxentClassifier and labelling each sentence either simple or complex. This was pretty dismal at single-sentence classification (getting ~61-63%) but did alright on larger chunks (~91-93% for texts consisting of 25-30 sentences). But my dataset consists of paired sentences and I’m interested in applications where the question I’m asking is, “which of these two sentences is more complex?” I want to place them on a scale, if you will.

To that end, I started using SVMrank and felt I was getting pretty good results. Just a one percent improvement (about 300 sentence pairs in this dataset), but I was using a somewhat hackish imitation of the feature I really wanted to add, so 1% was a promising start. Upon further reflection, however, I couldn’t recall exactly what parameters I’d passed to what runs of SVMrank, so I methodically reran my tests and, lo and behold, the difference disappeared: my 1% gain shrunk to 0.1%. In the course of messing around with things, I also discovered that there was a 20% drop in accuracy when I trained on more data, which sounds like overfitting, but wasn’t a problem I anticipated with SVMs.

So now I’m back to sanity checking using NLTK and examining paired sentences where I have either the complex sentence first or the complex sentence second. In classifying these samples as “complex left” or “complex right”, the NB classifier is getting around 71% using the most basic features (the ones that were giving 61% on single sentence classification as simple or complex). Given my recent experiences with SVMrank, though, I’m hesitant to assume this means what I’d like it to mean and move on to the next problem.

Any thoughts on SVMrank and why I might be seeing overfitting-like behaviour? Any recommended tutorials for working through this kind of ranking problem? Comment below!

Posted in Research | Tagged , , | Leave a comment

To the Senator’s Office

A couple of weeks ago, as I was investigating PIPA and SOPA online, I found myself on americancensorship.org, clicking through to the page for Ohio. There I found contact information for my Senators and a message board to coordinate with fellow Ohioans under the banner of Ohio for Internet Freedom. Over the course of the subsequent weeks this hodgepodge group of folks seeking to do something about P.R.O.T.E.C.T. IP managed to arrange a meeting with Senator Sherrod Brown‘s Deputy State Director, Beth Thames. This is the story of our meeting with her yesterday morning.

I arrived in Cleveland at around 9am and found the Senator’s office without a hitch. By about 20 to ten my fellow organizers, Matt Stafford and his father, had arrived and we chatted amongst ourselves until Beth came out to meet us. She greeted us with a smile and put us at ease as she welcomed us into her office to meet around a small table to discuss our concerns with the P.R.O.T.E.C.T. IP Act.

Matt suggested that I take the lead in our discussion, and so I opened by asking Beth for more details regarding the Senator’s position on the bill: why does he support it? what aspects of it does he believe will do good? While she could not officially comment for the Senator as to his position, she did say, as a member of his staff for fifteen years, that she believes his interest in the bill is at least in part driven by a desire to protect consumers. Noting that this was not an exhaustive representation of his support for the bill, she mentioned that he wanted to fight, among other things, the availability of counterfeit prescription drugs in the United States. I responded that, if his concern was counterfeit drugs, a more targeted bill might help him achieve those goals, and I was reminded that this was not his only reason for supporting the bill.

From there I launched into my argumentation against P.R.O.T.E.C.T. IP, running along two major themes: the bill is anti-competition and grants the government censorship powers. To explain the former, I cited the MPAA and RIAA as examples of the bill’s supporters. These industry groups represent what was, once upon a time, the new media: when recording came of age last century, it supplanted an entertainment industry driven by live performance. Now that these industries have established themselves, they don’t want to have to compete with the alternative distribution models that have become popular in the last decade: they don’t want to have to innovate. There was a bit more to flesh out the argument than that, but that’s the crux of it. I did not mention at the time the oft-cited figures of soaring profits in these industries despite (or, perhaps, caused by?) piracy and advertising-based distribution of their media for “free”.

After some discussion I went into the censorship component of the bill: we do not want to grant the government, let alone private companies, the ability to censor the internet. Even with discussion in the past week of dropping the DNS provisions, the bill grants the power to cut off search engine leads and advertising revenue, which is plenty to kill a website.

The web today is driven by user-generated content. This is what has spurred the internet’s growth in the past decade. But P.R.O.T.E.C.T. IP disincentivizes entrepreneurship. If I want to start a new site based on user-generated content, I have to hire dozens of additional people to moderate user comments or develop technology to automate this process.  If I choose not to do this, I risk having users’ comments get my site blacklisted. Even if my domain name still resolves, losing search engine traffic, losing advertising revenue could put me, and all my employees, out of a job in a heartbeat. The business climate PIPA creates would stifle innovation and entrepreneurship on the web.

Following this initial line of argumentation, I invited Matt to share his views as well. Bringing up the DNS provisions, he pointed out that the bill runs counter to DNSSEC, a standard  heretofore supported by the US government and designed to secure the internet against high-level hacking. Beyond the DNS component, we brought up the effect this has on consumers. For the technically savvy, there will be workarounds (we already have Tor for getting around the Great Firewall of China). For the less technically savvy, it’s likely that hacked versions of these workarounds will proliferate: if they’re not careful in what they download, they may well get a virus with their censorship-workaround software. For the still less technically savvy, these things won’t even be an option: their experience on the internet will simply never be the same.

This effect spills over into business as well. The United States of America is still a world-leader in online technologies. Innovations in our tech sector drive the industry forward. But if we start burdening search engines and advertisers with these additional responsibilities, we could see them start moving abroad. “Easiest way around PIPA-firewalls? Use google.fr!” or some other service hosted elsewhere. Until the government starts blocking them too.

At a few points throughout our meeting, we did joke with Beth that this would put us in the same league as China and Iran. This led us to acknowledge the political rhetoric just waiting to be let loose: these bills are fundamentally un-American—stifling the free-market and bringing about government censorship. Connecting this theme to the threat of businesses leaving the US, we discussed Google’s withdrawal from mainland China following years of attempts to compromise their ethics sufficiently to operate in the People’s Republic.

Matt also brought up his job at the Examiner, and the prospect of lost income not only for businesses but for individuals who freelance. The effects of the bill are far-reaching and would hinder his ability to contribute to the local economy of Cleveland.  He also brought up the government-condoned vigilante justice portion of the bill. This portion, Section 5, encourages ISPs to violate our privacy and investigate the contents of our data transfers by saying that they shall not be held accountable for actions taken “in good faith” to serve the purposes of PIPA. This led me to suggest that the Senator should support a bill for net neutrality, making the ISPs utilities providers who should provide “dumb pipes” to transfer our data, regardless of the content.

At the end of the meeting, Beth asked us if there was anything in this bill that we thought could be saved. I said yes. What can be saved is the spirit of support for law enforcement. I see this bill as a cry for help from law enforcement and the DOJ as much as a concerted anti-competition effort on the part of certain industries. But we can support law enforcement with additional resources and staff without new legislation.

I did also say that incorporating more due process in the bill might make it more palatable, though even with a trial in place rather than the mere request of the Attorney General or a rights-holder this amounts to a censorship bill. I stressed this final point heavily: the bill is fundamentally a bill to censor the internet, and that is entirely unacceptable.

Deputy State Director Beth Thames thanked us for sharing our concerns. Having taken our contact information at the beginning of the meeting, she let us know that the Senator’s office would be in touch sometime soon with their response to our meeting. I look forward to receipt of this information and will share what I can at that time.

Thanks for taking an interest in protecting our freedoms online, thanks for reading my tale. Please share your thoughts on the meeting, questions, comments, your own arguments for or against this sort of legislation. Let’s continue the dialogue that makes our nation great!

Posted in Politics | Tagged , , , , , | 1 Comment

A (Potentially) Essential Productivity Tool for Grad Students

Today is Saturday, so naturally I have spent several hours procrastinating on the web: watching TV series and YouTube series on YouTube, looking at job listings on Linguist List, visiting TVTropes for nearly the first time, etc.

Then I got to searching for some Chrome extensions. In particular, I was hoping to find an extension to block LSOs (Local Shared Objects), which are like special cookies built into Adobe’s Flash. The main difference is that they can store something like 100kB of data and are much more difficult to delete. (Here’s an article from How-To Geek with more details on getting rid of LSOs.)

This led me to discover StayFocused, an application for helping with productivity in just the way that grad students need most: blocking sites that suck up all your time. I’ve only spent a few minutes tinkering with it, but it’s very customizable. You can add blocked and allowable sites and set daily limits on how much time you should be allowed on those sites. You can also toggle days of the week and times of day you want the blocking to run. Of particular interest is the Nuclear Option, which lets you block everything that’s not on your allowed list for however many hours you’d like–and it can’t be undone! (There’s also an option to even block your allowed list.) There’s also a Require a Challenge option that makes it take a bit longer to change your options, helping you resist the temptation of changing your settings whenever you want a bigger fix.

I’ve set mine up to limit how much time I spend on YouTube and Facebook for now, and might add, say, this blog and Language Log and other sites to the list as well. For my allowed list, I have only gmail, ohiolink, and OSU-based websites. I’ll keep you posted on how well this works and whether or not my productivity improves as a result.

If you have any suggestions for similar add-ons or sites that help keep you focused, share them below! I’m particularly interested in similar extensions for Firefox and your experiences with these types of tools.

Posted in Uncategorized | Tagged , , , , | Leave a comment

Howcroft’s Soon-to-be-famous Chili

Going to Kroger today to buy groceries, I realised that I had forgotten to look up a good chili recipe before going. So naturally I decided to wing it. How hard can it be to make some veggie curry? Anyway, what follows is the very simple recipe I came up with. (I’d also forgotten to look up a good curry recipe, and some others, but I’m not quite adventurous enough to take those on without instruction.)

The Recipe

  • Chili Beans (2 cans) – I figured I’d start with the obvious and go from there. I used Bush’s Medium Chili Beans, to get some built in seasoning.
  • Black Beans (1 can) – Just used the Kroger brand for this.
  • Kidney Beans (1 can) – Again, the Kroger brand.
  • Great Northern Beans (1 can) – You guessed it. (Kroger!)
  • Diced Tomatoes with Green Chilis (1 small can) – This worked out pretty well; just the Kroger brand.
  • Prego Traditional (1 24oz. container) – I used pretty much all of this as the tomato base for the chili.
  • Tone’s Garlic and Herb Seasoning (2 shakes) – Just from what I had lying around.
  • Salt (5 twists of the grinder) – Obviously adjust this to your taste.
  • Black Pepper (something like 15 twists of the grinder) – Again, adjust to taste.
  • Thai Kitchen Red Curry Paste (a smidge) – This was just to make the recipe uniquely mine. I just scraped some from the top of the jar and used that spoon to mix everything thoroughly.

When it was all thrown into the pot and mixed together, I let it sit on low heat for 2-3 hours on the stovetop. I served it with saltines, and next time I think I’ll add some cheese.

What are your favorite chili recipes? Do you have any suggestions for future iterations of my chili?

Posted in Food | Tagged , , , | 4 Comments

First Update of the New Year

Trained to Type

I recently completed training with the College of Arts and Humanities tech group to become a Drupal Content Editor for the OSU Department of Linguistics website. While I was initially skeptical of the value of this enterprise, having spent a bit of time kicking around Drupal and building websites of my own, the experience wasn’t a drag. I got what I needed out of it, in terms of familiarizing myself with the actual use of Drupal (as opposed to the post-installation module experimentation I got caught up in when I ran it for myself), and I also enjoyed their presentation of accessibility concerns. I haven’t taken the time to examine this WordPress theme, so I don’t know how sincere this will seem, but accessibility concerns are something that I’ve always valued and taken seriously. This is part of the reason that I got into reading alistapart and studying web standards. (The rest of the reason being that I wanted to act all superior to the heathen masses that just kludge it all together ^.~)

Anyway, that’s all to say that I can now take up my role more actively within the department web committee. Well, that, and that I’ll probably be posting more about the web and its design.

Christmas and the like

The break between quarters was lovely, as I travelled to Georgia to spend time with Ashley during her finals before heading to Florida together for the holidays. While Mum, Dad, and Sam spent Christmas in South Africa visiting family, and especially our dearest Granny Joyce, I went to Miami with Ash and her dad for Christmas. Staying with her Grami, we played lots of card games as well as some Bananagrams and Appletters. Back up to Orlando just in time for me to get sick, us to celebrate New Year’s with her mum, and to drive back up to Georgia. And the next day I got to drive to Ohio. Yay.

A New Semest–er, Quarter

I’m still getting whole to this whole quarter system thing and I’m sure I’ll become accustomed to it just before OSU makes the semester conversion (in 2012). This quarter I’m enrolled in courses on semantics, phonology, and computational linguistics (four classes total; I’ll let you guess which two overlap). I’m also regularly attending Clippers, the compling working/discussion group.

More Regular Updates?

I won’t call it a New Year’s Resolution, but I want to try to post more; I think it’ll help me focus on my goals and advance the style of this blog beyond that of a middle-school diary. In this regard, I plan to write about some of the books I’ve been reading lately (The Black Swan, Freakonomics, Why We Make Mistakes, How We Decide, etc) and my coursework and research. We’ll see how all that goes.

Until next time: Peace!

Posted in Days of my Life, Family and Friends | Tagged , , , | Leave a comment

Ahh, the sweet taste of progress

Classes have been in session for about a month now, and I’ve definitely been monologuing about life a lot less than I’d hoped. Here’s a post to remedy the situation.

Syntax Squib

Syntax has been going pretty well, though I still don’t feel that I’ve got the best grasp on some of the material. I think it basically boils down to the fact that there’s a lot going on with language and it’s hard to pin down. I mean, I guess that’s why there’s a field devoted to it that’s been hard at work for 50-odd years.

I think I’ve got my topic for my squib, though. For those who don’t know, a squib is not just a muggle born to wizarding parents : a squib is a short (5-10pg) research paper on a topic in linguistics. Presumably the type of paper isn’t limited to syntax or linguistics in particular, but I have no outside confirmation of this intuition.

Anyway, my topic is a construction that I’ve encountered since my arrival in Ohio: sentences like “The table needs wiped down”. This needs + Verb.PastParticiple construction is new to me, and caught me off-guard the first time I heard it. I’ll be examining a couple of different potential explanations of the phenomenon to determine what exactly is going on here. In particular, what makes this dialect of English different from my own native dialect? Does introducing these changes to the grammar predict some other forms and, if so, do these in fact occur?

I’ll keep you posted as the paper progresses.

NSF Graduate Research Fellowship Application

I’m also working on my application for the NSF GRFP. For those of you who aren’t familiar with the program, here’s a brief description of the program from its website:

The NSF Graduate Research Fellowship Program (GRFP) helps ensure the vitality of the human resource base of science and engineering in the United States and reinforces its diversity. The program recognizes and supports outstanding graduate students in NSF-supported science, technology, engineering, and mathematics disciplines who are pursuing research-based master’s and doctoral degrees at accredited United States institutions.

The fellowship is incredibly competitive, and so I’m in the process of refining my ideas of a suitable research program with my advisor and others in my department. I basically just want to have the most competitive application I can. If any of you are interested in reading my proposal or in giving me feedback, please feel free to email me at [mylastname]–AT–ling[dot]ohio-state(dot)COM.

And Most Importantly…

Ashley and I recently celebrated our sixth anniversary, so she flew up to Columbus to visit me this weekend! We had a wonderful time and got to see more of the Columbus area than I’d explored yet. Some highlights include: painting ceramics at Color Me Mine in Dublin, a wonderful dinner at a fancy restaurant downtown, and seeing the President speak on the Oval on Sunday!

Posted in Family and Friends, Research, Syntax | Leave a comment

First Day of Graduate School

Classes started yesterday, my first graduate classes. I also learnt why I even bothered to come to grad school in the first place. It turns out that it requires post-graduate education to learn that it’s a bad idea to go swimming with your cell phone…

Anyway, brief rundown of the day… I started out with some meditation right after waking up, a habit (the meditation, that is; not necessarily the time) that I’d like to develop further. After that I had some toast&peanutbutter for breakfast before running off to the RPAC to begin forming a habit of daily exercise. Swimming was wonderful, but I wasn’t feeling too hot afterward, so I’ve got some experimenting to do to figure out if that’s just because of the proximity to waking or caused by eating so soon before working out or what. Then I had to rush home (because I dawdled, feeling too nauseated not to) to shower before class.

First Class

My first class as a graduate student was Syntactic Theory I with Peter Culicover (LINGUIST 602.01, for those looking at the OSU course list). We’re using his book, Natural Language Syntax, which is one of the ones that I started reading a while ago. It’s just as well, too, since our first assignment (based on the first two chapters) is due on Monday. We spent the better part of the session discussing wh-movement and sluicing to explore what kinds of questions get asked in syntax and how we might answer them.

Between this and the next class I ran for lunch and got an Italian Sub from the Creamery by Mirror Lake, which was pretty delicious.

Second Class

My second grad course was Computational Linguistics I, being taught by William Schuler and using Speech and Language Processing. Unfortunately, the first class was spent going over finite state automata for the umpteenth time in my life. Definitely not the end of the world, as it lays the foundation for our future discussions (of regular expressions and hidden Markov models in particular).

Overall, classes went pretty smoothly. We had more students in Syntax than I expected, but that’s because some people from psychology and some undergrads also take the class. 684.01 (CL I) was much more what I expected from a graduate course, having something like 10 students.

The Rest of the Day

After an afternoon spent at home to meet the Terminix man and a mattress cleaner for my roommates, I headed to the department to hang out for a while during our wonderful tornado watch. Living in a third floor apartment, I would much rather be at the department than in my apartment during those alerts. (Even if it does make me feel pretty ridiculous, like all those people who freak out over hurricanes.)

From there I walked to the Newman Center to join the Ice Cream Social they were having and met some lovely people. I got to chat a little (and only very briefly) with Frs. Chuck and Joe and also managed to meet some fellow students, both grad and under-.

I’m still trying to figure out exactly how involved I want to be with all the different groups available here–I kind of want to see how classes go for two or three weeks before really committing to anything–but there are a few things I like at the Catholic center. One is the Muslim-Catholic dialogue that meets weekly and promises to become a more diverse interfaith group as we can find more people to join. Another is the Spiritual Life committee which I hope to join to get some Taizé services going. Oh! And there’s also a group that meets Mondays and Tuesdays to do contemplative prayer (i.e. meditation)! I completely forgot about them… Finally, I’ve been sold on the idea of going to the Buckeye Awakening retreat in November.

Today and Comments

Friends. Feel free to comment on any aspect of this or any entry, but bear in mind that the parts that seem mundane and trivial to you are a part of my life: some things are recorded more for my benefit than for yours. To help you avoid such unpleasantness as reading about my day(!), I’ll categorize these entries as “dailies” or something like that, and you can ignore them and instead look for analysis and opinions or other responses to specific incidents or ideas (which will be differently categorized or uncategorized).

Today I’ve only one class, Formal Foundations, but I’m also going to check out the Pragmatics group that meets this afternoon. More when I have it to report!

Peace

Posted in Days of my Life | Tagged , , , , | 1 Comment