Friday, June 26, 2009

Microsoft's Velocity - Distributed Caching

Introduction

The rate of change of position, in physics, is termed as VELOCITY, has very little coincidence with the concept of “Distributed Caching”, but MS-Velocity is definitely catching up slowly but surely more on its conceptual merits and on the need of the hour of the “software applications world”. The terms scalability and availability are becoming a default necessity in today’s software applications, hence ‘ms-velocity’ is here.

Scope of this Article

Thorough understanding of, “The Microsoft’s way of Distributed Caching”

Who is making this noise

A project team/code named “VELOCITY” within Microsoft. It is at beta stage (Beta is nothing but just one step to go live, though beta version might also have several sub-versions in it). Microsoft has a much fancier name for beta; called CTP, “community technology preview” in full and the versions are called CTP1, CTP2 & CTP3. Currently velocity is at CTP3. Frankly speaking, to me, it looks like an alpha version and gives you a feeling of integrating bits and pieces. For eg:- For most of the time CTP3 bits were ready but the CTP3 samples were missing, ofcourse they are ready now, now the help file to understand the sample code is missing.

Caching Story

A cache is a type of dynamic and high speed memory that is used to supplement the function of the central processing unit and the physical disk storage. As the microprocessor processes data, it looks first in the cache memory and if it finds the data there it does not have to do the more time-consuming reading of data from larger memory. The cache acts as a buffer when the cpu tries to access data from the disk so the data travelling from the cpu and physical disks can have synchronized speed. Disk reading and writing process is generally slower than cpu function.

Caching in Web Applications

We cannot imagine the MSN portal or the Amazon web site, or the corporate SAP financial application being down when we need it. Same is the case with any online banking website in the world. Fundamentally, applications need to be available all the time to support access at any time, and from anywhere.

Another major expectation, especially from application developers and from datacenters is that of scalable and available applications at a low cost.

One of the most important factors in building high-performance, scalable web applications is the ability to store items, whether data objects, pages, or parts of a page, in memory the initial time they are requested. You can store these items on the Web Server or other software in the request stream, such as the proxy server or browser. This allows you to avoid recreating information that satisfied a previous request, particularly information that demands significant processor time or other resources. Known as caching, it allows you to use a number of techniques to store page output or application data across HTTP requests and reuse it. Thus, the server does not have to recreate information, saving time and resources.

Distributed Caching is not entirely a Brand New invention.

Distributed caches are not new – during the last couple of years several caching products (memcached, ncache & sharedcache) have emerged to address the performance and scalability needs of applications. Most of these support key-based access. Other than memcached (by Danga Interactive), which is an open source technology, most others target enterprises and enterprise workloads and scale. I think the web workloads require considerably large scale, with 1000s of cache nodes in a cluster. The web scale distributed caches not only require mechanisms that can scale and provide availability in very large clusters, they must be easy to manage or self-managed.

ASP.NET Caching & How Velocity is different

Velocity is all about caching & only caching (exclusively); prior discussing Velocity, I would like to brief the basics of ASP.NET caching. In ASP.NET caching we have two types of caching a) page output cache & b) application cache. Page output caching saves the output of page processing and re-uses the output instead of re-processing the page when a user requests the page again, where as application caching allows you to cache data you generate, could be any object like for eg:- dataset or any business object.

ASP.net cache object runs in the same process as your web application, it could be an advantage depending on your need and at the same time it could be a disadvantage. Whatever, it makes it clear that it is not a distributed cache, which means the ASP.NET cache cannot be shared among multiple servers. If you want to share the same ASP.NET cache among multiple servers, you must duplicate the cache for each server.

The greatest advantage of ASP.NET is that it works great for web applications running on a single server, but when you have a web-farm scenario consisting of multiple web servers then there is no straight forward implementation. Similarly whenever a server fails the natural thing we would do is to reload the data programmatically in to our objects. These two issues can be considered as limitations because situations like this doesn’t help sites which has millions/billions of users. Scalability is an issue with ASP.NET caching, unless the programmer does a brilliant work around.

Velocity addresses the above limitations of ASP.NET caching.

In the near future, “Velocity” envisions being an integral part of the .NET application stack targeting both enterprise and web workloads (and scale).

As applications start using the caches for data access, it’s easy to believe, they will demand richer data services like query, transactions, analytics, synchronization etc. We never know that the requirement of LINQ queries on distributed cache on most of the .NET applications could be one of the most programming requirements in the coming days, just like they query the backend SQL Server database. Microsoft envisions “Velocity” to become a comprehensive distributed caching platform. The performance, scale, and availability functionality of “Velocity” along with its rich data services will allow for rich web and enterprise applications development and deployment.

Distributed Caching fits here

In general, distributed caches are especially ideal for applications with the following characteristics…

Ø There is a considerable number of data requests that are mostly read (e.g. product catalogs)

Ø Large concurrent access to such data can be provided by replicating the catalog data on multiple cache nodes. Since updates are infrequent to such data, maintaining consistency (synchronously or asynchronously) is not very expensive.

Ø Applications that can tolerate some staleness of data
Such applications can provide better performance and scale by not requiring immediate updates over refreshing of caches

Ø Applications that can work with highly partitioned data (e.g. session data, shopping cart)

Ø High scale and performance can be supported by partitioning and distributing data across multiple cache nodes, and thereby distributing data processing across the cache nodes

Ø Applications that can work well with eventual consistency.

1.2 System Requirements & Prerequisites

Supported Operating Systems:
Windows Server 2003 Service Pack 2; Windows Server 2008; Windows Vista Service Pack 1; Windows XP Service Pack 3

How to attack Velocity in notime

Download the CTP3 samples, do the setup for shared folder kind of caching on a single machine (your pc or laptop). Do not get confused by powershell tool and all that help on that, forget it. Simply open the velocity administration tool and try to start the cluster by typing Start-CacheCluster (remember this works happily only if you are using CTP3 dlls and the CTP3 samples, also remember you have to copy the installation dlls to your sample application folder and add references). That's it you are all set.

You will get to see more on velocity day by day here, im delighted at its ease and especially with the shiftover from the existing session code to the distributed cache plug-in just by few changes in the configuration file. Cool.

No comments:

Followers