Monday, May 23, 2011

Memory optimizations for Static Data - ByteArrays et all

This article details our course taken to statically load and query for a large number of Java objects in memory. We started with using 5-6GB per million objects using a hashmap. It currently stands at 150 Mb per million objects. Thats a 40X improvement over the Java Hashmap. If you think loading large amounts of data into RAM and querring it fast is something you are interested in, read on...

Java HashMaps - A logical starting point was using a Hashmap. While using a hashmap, speeds were awesome (~3Million QPS per thread). Space though was painfully bloated. Storing about 1 Million objects in RAM required ~6GB of space. The actual size of the data was as follows -
  • 15 strings of about 20 characters each. Assuming 2 bytes per character - 15 X 20 X 2 ~600Bytes
  • 15 other ints, floats etc. 4 X 15 = 60 Bytes.
  • Total = 660 bytes
Data size for 1 Million objects - 660 bytes X 1 Million = 660 MB
Actual RAM used by java hashmap and underlying objects - 6GB

Performance details for the Java Hashmap are below
Objects loaded Space usedLoad factor(Actual space/data size)Queries per second - Single threadtime for 1000 queries(single thread)
1 Million5-6 GB9-103 Million.3ms
I'm considering time for 1000 lookups because a typical request to our application required ~1000 lookups. Speed was a non issue but the size of data was a big one. There is approximately a 9X bloat for storing various references, indexes and bloated Java Strings!!!

The optimization machinery needed to spring into action. Here is the problem statement -
  • Need to statically load several million Java objects into RAM. (Several = As many as possible upto ~45Million)
  • Each of the objects contained about 15 Strings and about 15 other native data types.
  • All lookup will be key based. public Bigbject get(String key)
  • Searching on this data should be extremely efficient. A couple of thousand queries in a couple of ms is an absolute must.
Several alternatives were tried and with various degrees of success. Let me order the solutions in an increasing orders of success.

Judy arrays with JNI bindings - This was meant to be a replacement for the Java Hashmap. This Trie based structure(in c) guaranteed efficient memory usage and extremely fast access. Thanks to Carl Lobo for the JNI bindings. Using this turned out to be quite a problem. At the time of writing this, we only had a Linux based judy library so the Windows devs needed to deploy remotely to test their stuff. The JNI bindings were not complete so you needed to add to that to explore more complex functionality. But still, we got quite a boost in memory usage. Judy would return us a number that would be the lookup for in a Java Arraylist. Space -time characteristics achieved by this are detailed below.
Objects loadedSpace usedLoad factor(Actual space/data size)Queries per second - Single threadtime for 1000 queries(single thread)
1 Million1.7 GB2.5-31.5-2 Million.6ms

While there was 3X gains of memory, the bloat was still substantial.

A simple Sorted Java Array - Hold on a second, did I just say Array! Is the solution to this convoluted problem a simple Array. Turns out it does improve space requirements and speeds are quite acceptable. This solution was to just do binary searches over an array of objects sorted by the key. We also happen to have eliminated some of the data that we were loading into the object. the data in the new object was about 500 bytes. Space time characteristics were as below.
Objects loadedSpace usedLoad factor(Actual space/data size)Queries per second - Single threadtime for 1000 queries(single thread)
1 Million1 GB20.5 Million2ms

Something though still made us look further, why the hell is the load factor greater than 1? Also some of our strings were repeated. Should the load factor not be actually less than 1? Also we had a realization that an Empty String in Java takes about 40 bytes. In addition, our Strings were English Strings so we just wanted to store one byte per character. These thoughts led us to the native ByteArray implementation.

Native Byte Arrays - What was this. All the objects as just one big chunk of bytes allocated statically. Something like
Also since all our Strings were English, we decided to only store 1 byte per character.
To get to the 50th element you would just do this calculation.
Elem(50) = 49 X ObjSize to 50 X ObjSize - 1

Due to the english constraint, size of a single object was reduced to ~300 bytes.

Helpers were written to ensure that transforming(to and from Java objects), comparison and binary search could be done easily. For the Strings, we made the following assumptions -
  • Strings would be English
  • String length will not exceed 30 characters. If it does exceed, they will be truncated.
Not only did this meant that the load factor will not exceed 1, but we were actually able to do better. :)
All Strings that existed were put into 1 static Bytearray. And each of the objects just had references to this big bytearray of Strings. This meant that each distinct String would be only stored once. All occurances would just be references to the Big String bytearray.
Space time characteristics of the ByteArray based structures are below.
Objects loadedSpace usedLoad factor(Actual space/data size)Queries per second - Single threadtime for 1000 queries(single thread)
1 Million150Mb0.5300 Thousand3 ms

Shortcomings and fixes - The current constraint of having to truncate > 30 character strings can be avoided if just have string terminations by a special character and all references to the string bytes refer to the actual byte index at which they are stored and not the array index. Also the same Strings bytearray can be adjusted to include unicode instead of ASCII.
We are yet to take up these last 2 tasks and we don't really need them in the foreseeable future. I'll check if we can open source this implementation for all to use and improve.

*Please note that all credit for conceptualizing, design and implementation of the ByteArray Structure goes to Harish Chiugurupati. I am merely someone who understands the problem that was solved and the solution.

Monday, May 12, 2008

Advantage - India

Indian Internet properties typically lag their American counterparts by a couple of years. It was heartening to see at least some Indian sites actually take the lead and US sites creating clones of an established Indian product.


YourBillBuddy lets you find the best mobile plan across all service providers in your region. You need to upload your mobile bills, they see your usage and tell you the best plan suited for the same. The site has been around for more than 2 years now. Although not a big success yet, it's a pretty useful idea and tends to save you a lot of time and money. Recommendations based on my bills say that I could save 46% on my calls every month by switching providers!! Without switching from Hutch also my potential savings based on their other recommendations are quite significant.


YourBillBuddy recommendations for my mobile bills

Techcrunch reviewed a US clone recently launched that does the same. This is the first time that I noticed a US clone of an Indian website and it did bring a smile to my face. They say "Imitation is the best form of flattery for Internet properties" and the site's creators have reason to be flattered.


One problem I see with the site is that there really is no reason for people to come back to the site after they have optimized their plans once. People have short memories and not many would remember such a service existed 6 months after using it. Perhaps a client on the phone itself which monitored your calls and gave you your best suited plan would be of greater value.


Yahoo Glue was launched on the Yahoo India search pages last week. Glue attempts to create something like a homepage for the query you search for. The page is actually a mashup of search results from a variety of sources. In addition to traditional search results, searches for Angelina Jolie gives you a fact sheet, images of the actress, top music tracks and more. Search for soccer and you get things like a Wikipedia entry and league tables. The product is not really aimed at India or created by an Indian company. I am writing about it here as it has been launched in India first. This to me is Yahoo saying that India is a market which is matured enough to give quality feedback and be the first testing ground for a product which can change the face of search(literally) globally.


Search result for Taj Mahal

Unfortunately, Examples like these are rare and I do not have more to write about. Do let me know if you know of any other sites for India that lead the world in ideas or in technology.

Sunday, May 4, 2008

The Indian Social Networking Mess

The past year or so has seen the advent of several local social networking sites. There have been TV ads, celebrity endorsements, the occasional local event sponsorship and of course, loads of invite mails in your mailbox. I would typically give a minute or two to some of these sites but have never quite got the rationale for them. Anyways, I decided to have a closer look.

Globally social networking sites are huge traffic generators. According to Alexa - Myspace, Facebook, Orkut and Hi5 are among the 20 most visited sites in the world. Not a single Indian networking site has made the cut to be in the top 1000 sites. Indyarocks and ibibo managed to scrape into the top 100 sites for the Indian audience though.

Traffic ranks of some popular Indian social networks

Essentially for a social networking site to be successful, it should either

  1. Compete directly with the likes of Facebook, MySpace and Orkut and have significant advantages for users to convince them to switch. By doing this the new property will be globally competitive.
    OR
  2. Have strong India specific features which give the sites an edge locally

Here is what I felt about some of the players I reviewed -

Bigadda - Expectations were built with the background of this property. The site is owned by Reliance's ADA group and has been getting hype lately from Big B's blog hosted on this site.
My experience on the site started with an "import contacts" tool that did not work at all. I gave my Gmail as well as Orkut account details but could not get any of my contacts to the site with funny authentication errors or with a message that I did not have any contacts!!!!

Bigadda offers vanilla functionality. The 'addas' and forums together give the groups/communities functionality. The site also offers photo sharing, video sharing and blogging capabilities. Blogs lacked layout customizations, photos lacked a bulk upload tool and there was a small size limit to video uploads. The site is painfully slow and clicking on links takes forever. There were categories for Cricket and Bollywood throughout the site but nothing much to write about as far as India specific features are concerned.

This was by far the worst of the 3 sites I reviewed in detail.

Indyarocks - I first got to know of this site when a short movie by our group made it as a finalist to Indyarocks's short film festival. According to the graph above Indyarocks has recently become the most popular Indian social networking site and continues to gain at a swift rate. Indyarocks offers good space and bulk upload tools for videos and photos. The tools did seem to have some glitches though. Interestingly, Indyarocks offers to pay you money for videos that you either upload or see. I have a tendency to not like things that I don't understand though and a site paying me to see as well as to show my videos is definitely something that makes me feel unsure about them.

The site is highly India centric. Send free SMSes to India, chat in rooms on Indian topics, check out local movie listings in a large set of movie halls, Bollywood chit chat, local classifieds. Even the games on the site are India centric - Cricket and Bollywood games!! I can't help but notice that the India specific features offered by Indyarocks are not really related to social networking though. In fact most of these tools are available without logging in and Indyrocks is acting more as an Indian portal than an Indian social network.

Ibibo - The site's irritating "dont be a balti" advertisements introduce the site in bad flavour. The flat, slightly downward user traffic graph(graph on top) despite big advertising budgets second my opinion.

The Indian context of the site is about opinions and polls. While the idea is interesting and relevant, it may not be enough to get the viral growth necessary for social networks to take off. Search on the site is half baked and lacks the ability to correct spelling, look for synonyms and alternate words. The photo section of the site was competitive with top social networking sites with soothing UIs, no absolute upload limits, bulk upload and import tools.

Yaari and Desimartini - I had invites to these sites in my Email and decided to check them out. I sure do regret that decision. There was no easy way for me to add the 8 friends who had sent me invites for Yaari. The email contacts importer was broken and I decided to just stop then and there. Desimartini - they're basically an Orkut clone with a 'fun' section. A section which seemed to have no activity??

Companies and websites can typically be classified in three types, there are ones that are built to last, some are built to sell. Most of the sites I reviewed seemed to fall in the third category, 'built to fail'. With the exception of Indyarocks, which has the potential to be a very good portal for India, the other sites are definitely not among sites that I would like to go to in the foreseeable future.

Verdict on Indian social networks

P.S - Specialized sites for students (notably bharatstudent), professionals, finance or any other specific category have not been covered in this report.

Tuesday, April 29, 2008

Google is bigger than Sex!!!

Now that was a surprise wasn't it. Graphs at Google Trends show that the number of people who searched for Google at the search engine was consistently higher than sex for the last year(With one exception of the Christmas holiday season). I tried finding another term that would go higher that sex but nothing that I could think of did!! Facebook and Myspace showed a phenomenal rise but did not quite beat it. Microsoft, blog, obama were not even comparable. Do let me know if you can find something.


Graph showing search density of selected keywords


And remember, Google knows what you did last Christmas!! *** Evil Grin ***

Cheers!!

Saturday, April 26, 2008

Car deals @ carwale

I remember brainstorming with a friend about 2 years ago regarding the usefulness of a used car site in context of the Indian market. The biggest problem in and deterrent to thinking further along those lines was the potential difficulty in engaging sellers to come online to sell their goods. Another problem was attribution of the sales done through us. Carwale has definitely proven us wrong.


Search result at Carwale

255 Used cars to choose from in the Rs3,00,000 to 4,00,000 price range in the city of Mumbai is a huge collection. I do not have numbers to prove this but no physical dealer is likely to match up to that kind of a product catalog. The numbers in Delhi were even better. I'm putting up Mumbai as an interesting average search result for the site. The site claims to have more than 10,000 used cars on sale.

To say that sites like Carwale are inspired is an understatement. They are the genre of sites which pick up a successful model of business in the US and apply the same to the Indian market. Sites like Autobytel, Edmunds, Carsdirect have been successfully implementing on the same idea in the US for ages. While the ideas for these "inspired" sites may not be totally novel, good implementations of useful solutions can prove to be successful businesses.


Carwale: Inspired but useful

Carwale offers a comprehensive set of tools - To calculate the approximate price of a used car, dealer locator, Car comparisons, loan comparisons, approximate insurance premium calculator and EMI calculator are a few of them. A strong user community is actively contributing to car reviews and forums.

The site is good at narrowing down choices for the customer and moving him towards a purchase. The Recommend Car section asks questions like a dealer would ask - What is your budget? Are you more concerned with performance, resale value, comfort or economy? Are you very tall and would like a lot of leg/floor room? The tool then generates recommendations for buying a car based on your inputs.

Users would miss a similar tool for the used car section. A user will be highly lost on getting 255 cars in his budget to choose from!!

Carwale has their eyes on numerous avenues to generate revenue. They generate leads for new cars, car insurance and car finance. They charge various subscription and brokerage charges from used car dealers. The site displays ads for cars and insurance companies and will be valuable property to advertise at for these companies.

In all Carwale is a site all set to take full advantage of the growing base of Internet users in India.

Keep up the good work Carwalo!!

Cheers :)

Monday, April 21, 2008

Browser wars

Browsers are central to our web experience. The term web browsers are really an understatement for the application. They are application platforms for rich Web 2.0 sites, operating system for the world wide web would probably suit them more than the simple "browser".
Erik Larkin at PC World takes a look at 3 of the most popular browsers for the PC today as they push out a major release or will do so in the near future. He compares the newly released Apple Safari 3.1, Firefox 3 Beta 5 and Internet Explorer 8 Beta 1. Read on,I'm sure you'll find the article as engaging as I did.

Web browsers: which one works best

Cheers :)

Saturday, April 19, 2008

ITasveer - Printing photos, a click away

Have you ever had a relative ask you to send them photographs of your last trip but you never bothered to get a hard copy of the same. I personally wondered for a long time last year of the pains that I would have to go through to get a print, then courier them to relatives. My solution to the problem was to keep delaying till both of us forgot about it. ;) Sites like itasveer and picsquare offer printing your online photos and delivering them at quite reasonable prices. In fact @2.90Rs per pic itasveer is cheaper than the friendly neighborhood studio. Add to that a 30Rs delivery charge and you're done.
Started by a group of 4 IIT Delhi alumni, itasveer offers two kinds of services - Photo sharing and printing. Printing includes printing of photographs as well as customized souvenirs.

Let me start with photo sharing. The site has substantial photo sharing capabilities allowing you to upload and share photographs with friends. Like a lot of other sites including the likes of flickr, there is no absolute limit on the amount you can upload. But there is a more subtle - upload allowed per time limit on your account. Personally though, I don't see a the point of them doing it. With the giants pumping in huge money in this space I do not see how itasveer will match Google on uptime, reliability and even features in the long run. Photo organizing and sharing should soon start to have capabilities to identify faces and software recognizing faces after training. Clearly this is not a space which a small company wants to enter without a distinct technological leap. Even if say itasveer was able to give service comparable or even better than flickr, photobucket or picassa, I would still be highly inclined to store my photographs on the servers of a large company rather than a small startup which may not exist 2 years down the line.

That's enough of what I believe is the bad side of their business, let me look at the good one. For one the photo printing definitely fills a void. Despite several players existing today, a good execution of the service should lead to a strong success story. The idea of souvenirs with custom photographs and themes made their case stronger.
The interface of the site is soothing and I was comfortable finding my way through the site. The importer from Picassa specially impressed me. The photographs seemed to be available to itasveer very quickly and they did not even ask me for my Google password in the process. The site handles photographs from Picassa and Flickr well but lacks importers from a bunch of other tools most notably Photobucket.
Doodlepad is what they call their tool to design various souvenirs. It seemed to work well in a way that it allowed extensive customizations to designs of cups, calenders and other items. The pre-existing themes are nice and the team deserves credit for creating them.



A calender page under construction using itasveer's doodlepad


One thing that did not seem quite right in the doodlepad was the ability to create items without extensive customizations. I wondered why someone who was so bad with designs was forced to go through so much customization to create a calender when giving the tool 12 pictures and selecting a theme should be enough to create a default one. This ability may prove to be tremendously useful for the artistically challenged, which btw constitutes the majority of the technology workers at least!!
Besides that there were a couple of glitches in the site but overall things were working smoothly. I think upon making doodlepad more friendly for the dumb folk like me should be a big leap forward for them.
Wishing itasveer the best of luck for the future!!

Friday, April 18, 2008

Weekly dose of desi internet

Time to time, the writer in me awakes and asks me to write my thoughts. Last time I tried(here), I could only manage a few posts over the course of almost 2 years. But this time, things are gonna be different. This time I'm a writer on a mission.
The mission is to cover internet startups from India. Internet startups are something that I read a fair bit of and this should only be a logical extension to a pre-existing hobby. Hopefully I will be able to provide my readers with a dose of exciting startups in the Indian context.

Forward March Mr writer!!! Forward march!!!


Inshallah. Is baar to Fateh hogi ;)

[Readers are most welcome to tell me about Indian startups and I'll cover as many as my weekends allow me. Post startup links in comments or mail me with their briefs at prashant.maskara at gmail dot com]