I have a potential interview coming up with a certain technology company, who shall not be named though let’s just say their motto should be Go Big or Go Home. In preparing for the interview, the recruiting contact made a number of suggestions including be prepared to answer the following question:
Expect questions designed to test your analytical and technical ability (example: how would you store all the phone calls in the world?).
I’ve been tossing and turning all night thinking about this question, even though I’m sure it won’t be on the interview. How would I answer it? I think the question is more, how would I not answer it. That question begs a dozen more:
What is the purpose of storing all the phone calls?
Who are you storing it for? The originators? The world?
When you say store all the world’s phone calls, do you mean the meta data (i.e. who called who and when) or do you mean actual recordings of the conversations?
If we’re talking recordings there are many legal and societal implications before we even address the engineering.
How will this information be indexed? retrieved? accessed?
Do we need it searchable or just filed away?
How long do you need to store this information?
The security and privacy implications of such a system are huge. Consider a phone app that listens for sensitive data much like a keylogger. See http://blogs.computerworld.com/17785/sensory_malware_android_app_listens_then_steals_credit_card_data
In terms of pure data, you’re not really looking at much (compared to music audio or video). Voice quality audio is about 8kbits/s. Assuming the average phone user spends one hour per day on the phone that 1kbyte/s*60second*60minutes = 3600 kbytes = 3.6mbytes. Not much by today’s standards. Assuming 1 billion phones worldwide, that’s 3.6 billion megabytes per day (or 3.6 million gigabytes). Now this isn’t unsubstantial. We’re talking 3600 terabytes PER DAY. Google’s search index use around 1000 terabytes. Youtube about 45 terabytes.
I would have to say that one approach would be to use distributed storage. Let each user store their own phone calls locally on their phone. Now we’re back at 3.6 MB per phone user per day, which could easily be stored on a 1Gig sd card. In fact you could store nearly a year’s worth of phone calls on each phone. Of course, we’re back to the original questions I asked, why are we storing this? Does storing it in a distributed fashion like this meet our functional requirements? Even if we’re doing this for searching purposes, we could index the phone calls in a centralized location and allow them to be searched but to pull up the actual phone calls from the phones. Obviously each phone call would be stored on at least two phones, so you have some redundancy, but you could have even more by having each phone user agree to store (encrypted of course) other’s phone calls. Using a 10 to 1 factor would still allow the storage of up to a month’s worth of phone calls on each phone equipped with a 1gig card.
Now that I’ve thought it through, I hope they ask this question…