hubReports is a side project of mine. It's a site that gathers data from GitHub on a daily basis and then presents it via the website. I went live with the latest interation of it on November 1st 2013 and thought it was time to review how I think it did and where to go next.
The project had previously existed under a different domain but with the hubreports name still and a slightly different focus. After rewriting the project, adjusting its target, I felt ready to go public.
I didn't receive much in the way of feedback except for a few comments from people on Twitter. It seemed to be liked as there were plenty of tweets that simply shared the site, which also helped pull in more visitors. The lack of more in-depth feedback did make me wonder which parts of the site were working well and which needed the most attention.
Back down to Earth
This year so far, the site gets around 300 to 200 page views a week. Not amazingly busy but considering this whole project was based on teaching myself new stuff and better techniques, I'm reasonably happy with that number.
What went wrong?
AngularJS and SEO
Not being easy to index for search engines makes the site very difficult to find. The majority of visitors are refered from the Statuscode newletters, blog posts and tweets.
No hard feelings towards GitHub, couldn't have done hubReports without them, but they went and created a weekly newsletter for trending repositories. That's going to cut into my expected audience a little but probably wouldn't have made much of an impact.
I have all this data, but I'm not entirely sure what the best method for display it is. I've also certain features planned (allowing language / repo / user comparisons) but without any feedback I've held back on implementing them as I'm unsure on the best way to display the data. Purely based on my worry on how the rest of the existing site is being received.
The missing newsletters
There was a "News generation" feature planned, I haven't had chance to work on it enough. It was to be used for the basis of weekly newsletters for each "top / specially selected" language that hubReports monitors. Having those newsletters available, I believe, would increase the regular audience of the site.
Data, data, data...
hubReports gathers all of its data from the GitHub API. While having the advantage of making my life easier, there have been the odd bugs with the API that'll cause headaches in my data (missing repositories / users) or failure of a collection for that day. I'd much prefer to be working with the data from the GitHub Archive. But that isn't quite on the cards from my initial investigation. Mostly due to the fact that it's a lot of data to process and I haven't the available resources to constantly process the feed for statistics, without impacting the hubReports site itself.
What went well
Even though it made my life difficult with search engines, it made my life easier when it came to putting the site together.
Throwing this project together allowed me the chance to work with: Node.js, MongoDB, OpenShift hosting, CDN's, AngularJS, Grunt, Express and a few others. While I've toyed around with some of this stuff already, it was nice to have a set purpose to aim for and apply them to.
The world of tomorrow
Where will hubReports go from here? I've decided to focus on a few areas that I felt let the project down:
- Comparisons: I'd like to provide a way of selecting another language / repository / user, while viewing one and providing an easy way to compare their statistics. This would be more interesting if I throw a dynamic graph in there, so the compared statistics can be seen over time.
- News generation: For the goal of creating newsletters, I need (+want) more content in them, than just having tables of trending repositories and a few graphs.
- Increase Search engine friendlyness: Some time definitely needs to be spent making the site more indexable. If for whatever reason I find it out of reach for the resources I have available, then so be it.
- Data retention: I must make this a priority. The database is hosted on the free tier of OpenShift and I'll eventually hit it with the entire site probably grinding to a halt of forever serving out of date data. My initial target will be repositories / users who haven't been seen by hubReports for a set period of time. That'll buy myself a decent amount of extra time. If that turns out to be a simple enough job, I may also look at how to deal with data retention of statistics. Possibly going with a RRD like solution to prevent data growth by averaging out stats over time (the older data gets, the lower resolution it is held at).
Any comments on any of this is more than welcome. But if you think the whole project is pointless, it may not be worth saying so as I value the site primarily as a learning experience, so it wouldn't be entirely constructive if you did ;-)