Papers Project
Sat Mar 23, 2019The Toronto Computer Science Reading Group, also known as the CS Cabal is getting quasi-serious, and also mildly stir-crazy. We've been toying with the idea of writing a paper recommendation engine, a turing complete dance, and the next book we're reading. I'm not particularly involved in the lambda dance, believe it or not. I think dxnn
might eventually post a repo of notes, but we don't have anything that exists off a physical whiteboard yet. As such, I will say no more about this.
The Next Book
The vote for the next book we're reading is currently in progress. The criteria for candidates was
- A book whose title is a four-letter acronym
- That is freely available online
- That would be interesting to go through in a group context
The astute will notice that there's no Computer Science content requirement in those criteria, which is why I jokingly suggested TAOW. The serious candidates, in no particular order, are
I think my preference is probably for Paradigms of Artificial Intelligence Programming. PLAI1 might be interesting, in that we've read PFPL2, but not interesting enough for me to push it. The vote should be concluded by the end of the weekend, so I'll be able to tell you what we're doing next week.
The Recommendations
At some point we realized that we were a group of loosely affiliated people with limited, but focused free time that managed to seriously expand our collective and individual understanding of various computer science material over the course of about eight years. From that position, we might be able to put together a curriculum for other people in similar situations to do the same. The beginnings of that curriculum are here, but before we got to any sort of end state, we had a new-idea/scope-creep moment.
Could we automate that?
In other words, could we write the program that, given a topic and your current level of understanding of that topic, could put together a set of readings and exercises that would get you from where you are to expert status?
In dxnn
's words, "How hard could it be?". The beginnings of that project are over here. Our next action step is to build a classification engine that eats everything in the Arxiv Comp-Sci section, and uses Doc2Vec
to figure out how to cluster papers. That doesn't give us the solution on its own; we also need to figure out how to rate the complexity of different papers (or possibly figure out a way of decomposing different subsections of Comp-Sci into required concepts and figure out a complete Walk of them). Oh, and then also figure out a simple way of testing someone for where we sould start them off.
How hard could it be, right?
The only thing I've learned so far is that, for whatever reason, the gensim
implementation of Doc2Vec
doesn't allow passing a generator as the initial data sequence. This is unlike their implementation of Word2vec
. I have no idea yet whether this is some implementation detail that snuck by, or whether it's intrinsic to the Doc2Vec approach. I guess I'd better read the related papers first.