I've been testing and fixing minor things, and I think I have the SQL conversion ready to deploy. Unfortunately, I'm on vacation next week, and I'm not going to deploy it right before I leave and have things explode. Which means next weekend.
Most of the fixes were small. One bizarre bug led me to make a field load dynamically sooner than I intended. I'm really eager to get SQL deployed, so we can publish at least a preliminary API, and start loading as-needed and getting rid of things in memory.
The conversion itself will take about an hour, during which hubski.com will be down. Probably next Friday or Saturday. I'll be on IRC during, and after I'll make a post asking people to tag bugs they find. It's a big change, there will be bugs.
Let me know if you have any questions.
I'm predicting no effect, until we change more things to load dynamically. Because it's loading all publications on startup. So, right now, once loaded, the only difference will be the ctags which I made load dynamically. And that's only loaded when an individual story is displayed, and saved when a user adds a community tag. And in fact, it already loads less. The ctags used to be part of the pub, loaded from file, the full list of ctags. The only data actually needed to display a story is, "has the current user created a ctag for this publication?" So, instead of loading all ctags, it queries only that, "select count(1) from publication_community_tagses where id = $1 and username = $2;" I think it will be faster, but insignificantly so. For the same amount of data, in theory, SQL should be slower than raw disk reading. But right now, we have to read and parse the full pub, for every pub. So, I think it will be faster, once everything is changed to only fetch the data that the currently requested page needs. More importantly, it will be infinitely more scalable. If not, memcached. If hubski had significantly more traffic, memcached would be essential. But I think hubski typically has <1 request/second (can you verify?). If hubski were getting say, 1k requests/second, there's no way a dozen SQL queries for each request would be fast enough. We'll just have to see.
| The ctags used to be part of the pub, loaded from file, the full list of ctags. Sounds dangerous. At this point it would make sense for hubski to run on sqlite, but do you really foresee hubski requiring 1k requests per second in the near future? What 'kind' of file was it loading from beforehand?
Not really; just slow. I think hubski would be fine on SQLite, but PosgreSQL doesn't hurt, and it helps with a lot of things, like Elasticsearch and backups. No. But it would be nice if we handled the Reddit influxes well. Serialised s-expressions. Still is, just without the community tags key.The ctags used to be part of the pub, loaded from file, the full list of ctags.
Sounds dangerous.
it would make sense for hubski to run on sqlite
do you really foresee hubski requiring 1k requests per second
What 'kind' of file was it loading from beforehand?