Wilson Brenna's Personal Homepage

Scirate Journal Club

This project ranks last week's arXiv papers in order to select a discussion topic for a weekly coffee/journal club event.

Papers are ranked on Scirate and members of the workgroup have their account names added to the list of users in the script. It scrapes their past votes on a weekly basis and then ranks all of the votes for papers over the past week. The resultant output can be easily presented on a website informing everyone of the upcoming discussion topic.

Hopefully one day Scirate will have an API to let me do this! You can view our journal site as a sample, at Friday Journal Meeting.

User Guide

Create an account at scirate.com. Note your username - if you visit your Profile page it appears directly beneath your name - and email it to whomever administers the journal club.
Customize the feeds on Scirate to topics that you are interested in. Our workgroup typically reads "hep-th" and "gr-qc", and some use "quant-ph".
Sift through articles at your own pace; the script looks for papers from the past week, so make sure you don't put papers off for more than a week. When you mark a paper with the "Scite!" button, that will count as a vote from you towards the next meeting's topics.
Visit the meeting page, for example, Friday Journal Meeting, to see the top articles pending discussion.

Code


#!/usr/bin/python
import operator
import random
import urllib2
from datetime import date
from datetime import timedelta
import time
from BeautifulSoup import BeautifulSoup

Users = ["wilson-brenna"]

today = date.today()
last_week = (today - timedelta(days = 7)).timetuple()

random.seed(today.isocalendar()[1])

votes = {}

for user in Users:
	print user
	url = "https://scirate.com/" + user + "/scites/"
	soup = BeautifulSoup(urllib2.urlopen(url).read())
	articles = soup.findAll('textarea', {'class': 'bibtex'})
	dates = soup.findAll('div', {'class': 'uid'})
	for index,element in enumerate(articles):
		eprint = element.contents[0].rstrip()[6:16]
		dateString = dates[index].contents[0].rstrip()
		actualDate = time.strptime(dateString, "%b %d %Y")
		if actualDate < last_week:
			continue

		if not eprint in votes:
			votes[eprint] = 1
		else:
			votes[eprint] += 1

#print votes
f = open('votes.txt', 'w')
if votes:
    topVote = random.choice([k for k,v in votes.items() if v == max(votes.iteritems(), key = operator.itemgetter(1))[1]])
    del votes[topVote]
    f.write('<a href=\'http://arxiv.org/pdf/' + topVote + '.pdf\'>' + topVote + '</a><p />')
else:
    topVote = 'Too few votes to choose a first paper.<p />'
    f.write(topVote)
if votes:
	secondVote = random.choice([k for k,v in votes.items() if v == max(votes.iteritems(), key = operator.itemgetter(1))[1]])
	del votes[secondVote]
        f.write('<a href=\'http://arxiv.org/pdf/' + secondVote + '.pdf\'>' + secondVote + '</a><p />')
else:
	secondVote = 'Too few votes to choose a second paper.<p />'
        f.write(secondVote)
if votes:
	thirdVote = random.choice([k for k,v in votes.items() if v == max(votes.iteritems(), key = operator.itemgetter(1))[1]])
        f.write('<a href=\'http://arxiv.org/pdf/' + thirdVote + '.pdf\'>' + thirdVote + '</a><p />')
else:
	thirdVote = 'Too few votes to choose a third paper.<p />'
        f.write(thirdVote)