Monday, 28 January 2013

Want to store frequently accessed data on a faster storage (SSD)? Storage Policies can help.


Vertica "Storage Locations", as Vertica users know are storage paths on disk that you specify to store data and temp files. Each Vertica cluster requires a location to store data, and another location to store catalog. These default storage locations (that we specify during Vertica install & database creation) must exist on all the nodes.

e.g.
/home/dbadmin/testdb/v_testdb_node0001_data
/home/dbadmin/testdb/v_testdb_node0001_catalog

Whenever we add data to database or perform any DML operation, the data is stored in these storage locations on disk. Vertica also supports adding new storage locations to each nodes, if we want to increase our storage capacity. The new storage location can be on Vertica nodes, or a local SAN.

So far so good, but till Vertica 6.0, we didn't have the option to selectively store any database object (database, schemas, tables) on any particular storage location. All the data added or modified was stored in the default storage location. Having this option/freedom is quite desirable when we have some business critical (frequently accessed) tables that we want to store on a faster (and not to mention costlier) SSD disk.

Vertica 6.1 release (code-name Bulldozer) lets us do this with two newly introduced storage concepts, viz. Storage Labels & Storage Policies.

Location Label: This lets us create labelled storage locations. A new optional parameter has been added to ADD_LOCATION function to create a storage location with descriptive label. These labelled storage location are used to define storage policies.

VSQL=> select add_location ('/home/dbadmin/SSD/schemas','v_testdb_node0001', 'data', 'SSD');
          add_location 
-------------------------------------
/home/dbadmin/SSD/schemas added.
(1 row)


The example above creates a new data storage location with label "SSD". We will use this label to identify storage location while creating storage policy.

Once we have added our faster storage location (identified by a label), it's time to create a Storage Policy to associate database objects with it. The newly introduced Vertica function that lets us do this is called "SET_OBJECT_STORAGE_POLICY". Once a storage policy exists, Vertica uses the labeled location as the default storage location for the object data. Storage policies let you determine where to store your critical data. One example I already discussed above. Only one storage policy can exist per database object.


VSQL=> select set_object_storage_policy ('SALES', 'SSD');
     set_object_storage_policy      
-----------------------------------
 Default storage policy set.
(1 row)


Every time data is loaded and updated, Vertica checks to see whether the object has an associated storage policy. If it does, Vertica automatically uses the labeled storage location. If no storage policy exists for an object, or its parent entities, Vertica uses default storage algorithms on available storage locations.

A Storage Policy can be cleared using function "clear_object_storage_policy", specifying the object name associated with Storage Location.

VSQL=> select clear_object_storage_policy('SALES');