In the spirit of New Year we thought it was a good idea to give a little insight into how hubski works and what our plans for the technical future of hubski are.
First a little history of hubski's codebase. As mk said in his State Of Hubski we started out as a clone of Hacker News which is written in Arc Lisp, Paul Graham's experimental lisp dialect. HN is a rather focused side-project with design and code decisions made for specific ends for the hacker community pg wants to see: from the text-centric, process-list-esque design to a highly simplified markdown implementation. This is the what hubski has been built on, or hacked on in some cases.
One of the main issues that comes from this is that very few other people actively work on or with Arc. There are very few libraries written in arc so that means if we want to interface with most other software we need to write our own client library. The other option is to just code it ourselves, but most of what we write would pale in comparison to other software which have had a ton of time put into them by brilliant communities.
How things work currently
When you first connect to hubski you hit nginx which handles much of our static content as well as cached content. If you aren't logged in nginx will check a redis server to see if that url has been cached, otherwise it passes it on to the application which is a monolithic app which handles everything.
All of our data is stored in files each of which is just an s-expressions What the app does is it loads some to all of the posts and comments into memory in a hashtable. At the point we're at now we can't load the entirety of our data into memory due to issues with RAM size and expensive function calls. A lot of functions - such as detecting duplicate posts when submitting, searching, or retrieving feeds - are basically map calls across this entire working set. In addition to this our working set is a bit of a memory leak because it doesn't have a way to manage its size so it just keeps growing until the we reset the app (which we do fairly often).
Plans for the future
There are a lot of things that we would like to do - things that have been asked for by the community. Ideally we would have a system setup that automatically scales to deal with traffic and that makes adding services and features in any language relatively simple.
One of the first steps is going to be to give hubski a proper database. Separating the database from the hubski app will be a big step towards being able to add other services. As an example this will allow us to more easily integrate mature search solutions to hubski such as Elasticsearch or Solr Right now every post has a list of words which search as its search text and we map through our working set. This of course limits us to search through only what we can store in memory which is unfortunate.
The next step will be to make hubski horizontally scalable, i.e. allow us to run multiple copies of hubski and load balance between them. One of the things that this will allow us to do is run different versions of hubski at the same (like have a beta version that beta testers can use).
What this all means
Hopefully this will give some idea of the hurdles we face when developing hubski. The ultimate goal of our work will be to make hubski's code into as much of a fertile soil as hubski is. My dream is this:
1. Have hubski run and scale on its own while having the ability to add functions and services on the fly with a plug-and-play simplicity.
2. Have a development environment which can be set up in a short amount of time so open source developers can help develop hubski or just hack away on their own.
3. Have an nice RESTful API for people who want to build things with and for hubski.
There will be a lot of questions to answer along the way such as what is an appropriate API policy to allow people to be creative (and we have a very creative community) while maintaining a long-term healthy relationship with people who choose to use it and what parts of hubski would we want to open source with what license. At the end of it all I do think that we will be in a much better place. Even for those of you who have no interest in the technical details behind hubski we will, for instance, be able to better respond to your feedback.
I'm excited for the future of hubski and I keep this goal in mind whenever I'm working. We have a lot of potential as a community and I firmly believe this will give us a firm foundation.