Leveraging Virtual Storage: Data Duplication Across On-Prem and Cloud Repositories

Bill Tolson
Mar 12, 2024
6 min read

A recent enterprise storage market forecast by Benzinga estimates that the enterprise storage market is expected to have a compound average growth rate of 14% (CAGR 2023 - 2030).

Almost every company is dealing with rising levels of digital data. With the continuing adoption of cloud storage and computing, big data analytics, generative AI, and IoT technologies, there has been an enormous surge in data generation and use, necessitating continuing corporate investments in both on-prem and cloud storage infrastructure and personnel.

Cutting storage requirements

Due to the constantly rising volumes of digital data, organizations have tried various methods to address the rising costs. From placing limits on available end-user storage resources (share-drive allocations) to applying unrealistically short retention periods on non-regulated data, companies have forced employees to look for additional data storage opportunities, such as onto employees’ local workstations, smartphones, and personal cloud accounts. These unmanaged and untracked storage practices can put companies at risk of regulatory non-compliance and eDiscovery liability.

To help control the rising costs of data storage, technologies like data compression and deduplication have emerged as powerful techniques for saving storage space, enhancing storage systems' performance, controlling costs, increasing reliability, and providing enhanced enterprise data security.

Data deduplication and compression are technologies used for years to control and reduce storage requirements.

Data deduplication

Data deduplication technology ensures that only one unique data instance is retained on storage media (not including backups) at either the block or file level. Redundant files or individual data blocks are replaced with a pointer to the original unique data copy.

Data deduplication processes ensure that exact copies of the same file are not stored but rather only the additional metadata that provides information about file ownership, data rights, and other technical aspects of the file itself. Deduplication reduces overall storage overhead and cost.

Data compression

On the other hand, data compression is another storage space-saving technology that reduces the number of bytes needed to represent the original data. This is done by removing redundant or unnecessary information. Compressed files are usually noticeably smaller in size, requiring less storage space to store. Additionally, reducing the size of files through compression can also translate directly to smaller and faster system backups. Note: backup applications can also incorporate both data compression and deduplication processing (source and target). Both data compression and deduplication allow organizations to save 40-90% of the initially required storage space. Both technologies have become standards for enterprise data management practices.

But we need more - Storage Virtualization

Another tool that many organizations have used successfully to control data storage growth is to incorporate storage virtualization technology. I have described copy data virtualization in several restorVault blogs in the past, so I will not spend much time here describing it in detail again; however, the simple description of storage virtualization technology is:

Storage virtualization presents a logical view of physical storage resources (both cloud and on-prem) to a host computer system, treating all storage media in the enterprise as a single pool of storage. By abstracting physical storage resources and presenting them as a single virtual pool, storage virtualization provides numerous advantages that enhance organizational flexibility, storage and compute scalability, and cost-effectiveness via accessing the files from their most appropriate storage locations.

If you are curious about past blogs describing storage virtualization in more depth, you can read the following blogs:

The bottom line is that storage virtualization can dramatically reduce the number of backup copies needed to ensure ongoing data availability (see the “Beyond the 3-2-1 Backup Strategy – Data Virtualization” blog listed above).

Copy Data Virtualization also has its place

On the other hand, copy data virtualization can also be used to virtually duplicate and move files and file systems when needed for data security, regulatory compliance, disaster recovery, and accessibility. One obvious use case is in the legal/eDiscovery services industry.

Suppose you have a short window to respond to an eDiscovery request and want your external law firm or eDiscovery services provider to initiate and manage your eDiscovery response. You could conduct broad (and costly) enterprise searches for potentially relevant content and send these massive data sets to your outside counsel or provider for eDiscovery processing. Collecting (all) relevant data could take days or weeks while also opening you up to missing potentially responsive data – which could result in spoliation (destruction of evidence) or insufficient eDiscovery response. Additionally, the time to move the data sets to outside counsel before they can begin the review could put you at risk of missing the Judicial eDiscovery window.

Another widespread use case involves moving virtualized data to another location; for example, moving virtualized on-prem file shares to another corporate location or cloud, such as a SharePoint Online file share. Virtualized file systems can translate to much shorter migration times and reduced data loss or corruption risk.

A third copy data virtualization and duplication use case is realized in the relatively new adoption of Generative AI applications and the need to quarantine and access substantial amounts of sensitive company information as AI training data sets.

By creating multiple complete copies of the data (two copies are usually enough) and storing them in a highly secure and low-cost cloud platform, the virtual data files (pointers) and the file system structure can be moved and/or duplicated and distributed to multiple locations for specialized access and use.

By distributing the virtualized copies across different physical locations, organizations can realize significant corporate benefits, including improved application responsiveness, reduced downtime, greater employee productivity, and enhanced protection against data loss or corruption.

Remember, copy data virtualization lets you keep your valuable and sensitive data in a secure repository while moving or copying the file system structure and virtual pointers to other locations. Because the pointers use only approx. 1KB instead of MBs or GBs per file means that copying/moving virtualized data from place to place can happen much faster than the days, weeks, or months to migrate the actual data.

Data duplication based on storage virtualization has gained widespread adoption in various industries, including healthcare, finance, financial services, and e-commerce, where efficient, compliant, and secure data management is required.

Copy data virtualization for data duplication and migration has become a simpler and faster way to migrate large amounts of data to other places/geos/repositories – instead of physically moving the complete data sets from location to location.

restorVault virtual data duplication

restorVault has been a leader in enterprise storage virtualization solutions for many years and has been awarded several US patents that take storage virtualization to new levels. restorVault’s storage virtualization solutions provide organizations with several benefits, including:

Enhanced storage utilization and simplified data management capabilities
Improved data availability and resilience
Simplified scalability and flexibility
Cost efficiency and investment protection
Data protection against disasters and ransomware attacks
And the ability to duplicate, move, and share large, unstructured data sets without the need for time-consuming and risky large-scale data migrations

restorVault virtual data duplication (also known as v-dup™ and v-duplication™) is a powerful solution that can be used to create virtual copies of unstructured data from a variety of sources that are much easier to copy and move elsewhere for specific purposes such as eDiscovery processing, disaster recovery, data sovereignty requirements, and data migrations (for example M&A activities).

Note: data sovereignty refers to the concept that data an organization creates, collects, stores, and processes is subject to that nation’s data retention and privacy laws where the data originated.

For example, with the rise of digital transformation and cloud adoption, data sovereignty regulations have become a significant roadblock for businesses needing to transfer data to other divisions or partners. Many countries have passed data sovereignty laws targeted at the control and storage of locally created data.

Since many governments seek to prevent other nations from acquiring their citizens' personal data, they have created data sovereignty laws restricting if and how businesses can transfer personally identifiable information (PII) outside their jurisdiction.

This can be an issue for multinational corporations that must share information with other divisions or partners outside the data’s country of origin.

For example, using restorVault’s virtual data duplication, a corporation’s French division could provide seamless access to specific data stored in their French data center by duplicating the data set’s file system structure and creating virtual pointers to the original data in the French data center. This virtualized file system could then be migrated outside the originating country. This solution would provide non-French employees access to the original data in the French data center without (possibly) violating the French data sovereignty laws.

One caveat here is that you should always get an opinion/approval from your legal department before moving forward with any data sovereignty solution, mainly because each county could have additional data sovereignty requirements that could preclude this solution. Additionally, you must ensure (via processes and technology) that data accessed in the foreign data center has not been copied or stored outside the country.

restorVault virtual data duplication is a powerful solution that can help businesses improve their data management practices while consolidating large amounts of data into less expensive storage resources. It can help companies to reduce the complexity of their data infrastructure, meet regulatory compliance requirements, improve employee productivity, and improve their data security.

Contact us today to learn how restorVault can help your company save money by virtualizing and managing inactive data while increasing data security, regulatory compliance, and storage capacity!