This year I am running several conferences and it has caused me to ask this question: can science move faster? Today most scientists have far more compute power (peta-flops) than they did a few years ago and yet the many scientific processes are no faster than before. For example last year (2007) I had my very first 2009 publication. Thats right in 2009 a year from now a journal paper that was written in 2006 will come out, so is science really keeping up with other related progresses?
Continue reading "Science 2.0?" »
A few months ago I posted some cost analysis on startups using EC2 services from Amazon. Just recently they announced some updates to that service addressing two issues we did not examine.
- Availability Zones
- Elastic IP Addresses
Here are a couple of quick blurbs on the new capabilities.
Continue reading "Update to EC2 Services" »
Today, Eric gave a talk on our summarization technology at the University of Maryland's cloud computing speaker series.
Continue reading "Summize Text Summarization Talk @ UMD" »
Conferences are a great way to keep in sync with the latest ideas. This year I am co-chairing several conferences in the area of search. Today I thought it would be interesting to share our InfoScale 2008 call for papers. InfoScale is a conferences started a few years ago focusing on the issues of scaling networking and search problems. So, check out the call for papers and the venue of the conference.
Continue reading "InfoScale 2008" »
We are often asked why we use a histogram visualization we refer to
as a "snip" for ratings rather than the traditional stars or numeric
scores used on most other sites. The answer is a bit more complex than
"it looks cool" and thus seemed like a great topic for a post. Star
and numeric single value summaries for a product often hide
valuable information. This hurts users because controversial books are
often the great books, or the negative review is often the most
insightful. So today I am going to give a quick review of the history
of attitude encoding, examine the issues associated with central
tendency summaries (single numeric or star value) and demonstrate how a
histogram allows users to better understand the review-o-sphere.
Continue reading "Stars and Bars" »
Startups today need far less capital for compute resources then they did just a decade ago. This has been driven by the improvements in CPU processing speeds, people designing systems that run on cheap commodity servers and an overall trend of getting more computer for less. For Summize, I couldn't imagine building the technology and processing the data we have today with less than a few million dollars in hardware a decade ago. Now $25k gets you a lot of computing power. Even with the cheaper cost of hardware these resources come with additional costs that can be real issues for a startup. For example, you still must buy and setup servers, find rack space, deal with backups, DNS servers and many other tasks before those servers are usable. There is also the additional time and knowledge to perform the maintenance that keeps them running. In this post I examine some of the costs for figuring out when to build vs rent these resources. Others can benefit from this as it is not a problem specific to Summize but to compute intensive startups.
Continue reading "Computing Compute Resources for a Startup" »