Hirsute Sea Bass: 02/01/2005

Saturday, February 26, 2005

Kittens and Puppies

Per Dawn's request, I am posting again, and I am posting about puppies and kittens.

But now that I think about it, I really don't have much in depth to say about puppies or kittens. Puppies are small dogs. Kittens are small cats. I have had neither a puppy nor a kitten in my life for any great period of time. I see them now and then, and I pet them, and I make comments about how cute they are. I once dog sat my friend's dog while she was away in Florida. It was fine.

Sorry Dawn, I'm just not sure there's much to be said about puppies and kittens that hasn't already been said ad nauseum. In closing, however, I provide an exemplar of a puppy and a kitten.

A puppy.

A kitten.

Sunday, February 13, 2005

Note to self and others

While reviewing MB's lecture notes before opening up the mailer envelope containing our midterm, I came across one of those classic plots of voice-onset-time (VOT) versus percent of time the sound is perceived as the intended phoneme. In this plot, there is a sharp transition between perception of the sound as "bah" and perception of it as "pah" when the VOT passes a certain threshold. This is an example of categorical perception, or our tendency to break continuously varying quantities into discrete categories. I've recently felt there's something profound about the brain's ability to make the continuous discrete, and that issues in categorical perception are intertwined with a lot of arguments in politics and life. In fact, it seems that a common arguing tactic is to point out that the quantity that separates what two perceptually discrete categories is continuously varying, and thus the boundary between the categories is arbitrary. I don't have in mind the various times I've observed this phonemenon, but I feel like I've seen it a lot. One of my questions is whether this tactic--the invalidation of categories based on the arbitrary nature of the boundary between them--is valid or bogus.

I guess I don't have anything more to say; I just wanted to make a note of these ideas so that I will hopefully remember to read about theories of categorization at some point in the future.

Monday, February 07, 2005

More on quantitative intellectual history

Everyone really ought to know what an Erdős Number is. You might also check out the Erdős Number Project. The ENP has a page on collaboration research in general. Then yesterday I saw this paper in the pile of unclaimed printouts (you can a lot by seeing what other people printed out but never picked up) and thought it looked interesting. Like many papers I think look interesting, I'll probably never get around to reading it.

Sunday, February 06, 2005

Stupid internet tricks

Following metalife's lead, I did a dumb quiz and I am posting the results on my blog.

You Are 32 Years Old

32

Under 12: You are a kid at heart. You still have an optimistic life view - and you look at the world with awe.

13-19: You are a teenager at heart. You question authority and are still trying to find your place in this world.

20-29: You are a twentysomething at heart. You feel excited about what's to come... love, work, and new experiences.

30-39: You are a thirtysomething at heart. You've had a taste of success and true love, but you want more!

40+: You are a mature adult. You've been through most of the ups and downs of life already. Now you get to sit back and relax.

What Age Do You Act?

Wednesday, February 02, 2005

Idea for a side project which will never get off the ground

Just as mathematicians have The Mathematics Genealogy Project, neuroscientists should have a Neuroscience Genealogy Project. (For a sample of what the MGP can do, put in the name of one of your college math professors and start clicking the "Advisor" links. He or she is probably not that many generations from someone mentioned in your textbooks, such as Weierstrass, Hilbert, or Dirchelet. Erdos and von Neumann had the same thesis advisor.)

Neuroscience, being much younger than math, should be easier to catalog. As I've become more a part of the field, I've realized that even though SfN annual meeting attendance has grown from 1,396 in 1971 to over 31,000 in 2004, everyone within a subfield still seems to know (or at least know of) everyone else. Famous people of today were often trained by famous people of yesterday, and incest seems to run rampant, with former labmates helping out each other's students. Many subfields in neuroscience are still comprehensible communities.

I'd like to see graphs showing the intellectual fathers and mothers of the field and their descendants, so that we could see whose intellectual traditions had influenced the largest number of today's neuroscientists. Links would be made from PIs to their graduate students, and postdocs, and perhaps to their undergraduates too. From this we could see where an individuals influences came from, and who that individual infleunced. Links could maybe also be added to the graph between individuals who have carried on a significant collaboration.

So how to gather this data? The reason I'm thinking about this is because I think it could be done in a semi-automated fashion. We already have big publication databases like PubMed, ISI, and Google Scholar. I think that one could write a bot that could crawl these databases and use a few simple heuristics to infer the relationships that would make up the edges of a neuroscience genealogy digraph from bibliographic citations.

For each individual on the graph, the things to look at would be: (a) The order of authors on that individual's publications, (b) Frequency of coauthorship with another individual, and (c) where each frequent coauthor falls in the individual's career. For example, one's first papers are often written with one's graduate advisor. One may start out as a middle author on these papers, but a couple papers should be published early on where the student is the first author and the advisor is the last author. So have the bot search PubMed for an individuals first publications, and have it create a vertex for most common last author on the individual's first few first-authored papers, and a "graduate student" edge from the added vertex to this individual's vertex.

After doing this, take the next few first-authored publications and find the most common last author of these. Call this new last author our sample neuroscientist's postdoctoral advisor, and make a new vertex for this advisor and add an edge from it to the initial edge. Finally, when our neuroscientist begins a long string of last-authored publications, have the bot start looking at the first authors of these papers to determine our initial individual's students and postdocs. Do the same thing with in the opposite direction, too: The bot could move further up the tree by inferring our initial neuroscientist's advisor's advisor in the same way that it inferred his or her advisor.

Clearly these heuristics would need work and accounting for edge cases, but I think they would do a not-so-bad job for most stereotypical neuroscience careers. Because there would be multiple "Public JQ"s in PubMed, one could separate the publications of the neuroscientist from the non-neuroscientists with, say, ISI's Journal Citation Reports list of neuroscience journals.

Oh, and the bot could be seeded at the beginning with the lists of faculty that are available on the websites of neuroscience departments.

My question is whether these heuristics would work well enough or if it might not be easier in the long run to spend the time up front creating a hand-labeled training set and then using statistical learning techniques to build a classifier from this training set. Each vector to be classified would represent the relationship between a pair of individuals. The dimensions of this vector would be ones like I cited above (relative locations in lists of authors, location of the period of coauthorship in the timeline of an individual's career) and the output classes would be one of the possible edges, e.g., "Individual A is the graduate advisor of individual B." Now that I think about it, I suppose it would make sense to look at the entire publication record of both individuals when classifying an edge. That is, for Individual A to be the graduate advisor of Individual B, A should be last author and should already have gone through a period of first-authorship, and B should be first author and not have many publications yet.

So clearly these ideas need fleshing out before they are remotely implementable. My feeling is that doing this with statistical learning is probably the better way to go, even though it would probably require one to go through the tedious process of building a fairly large training set up front. The genealogy of neuroscience does seem like the sort of problem a computer could be programmed to solve fairly reliably, though.

Hirsute Sea Bass