Efficient Replication using Multicast Protocols

By Oscar Wahlberg, Director of Product Management

Replication as Data Protection

The use of replication as a data protection method is not new. Historically, it was relegated to disaster recovery for tier-1 applications and did not have a role in day-to-day backups.

Recently, however, many customers are using replication as their primary mechanism for backup and recovery for all tiers of systems – primarily due to the advent of object storage systems with built in replication and versioning. Unfortunately, the side effect is that this significantly increases the amount of data that needs to be replicated.

If one is to rely on replication as the primary method of data protection, one must replicate each version of every object to multiple nodes. Many systems transmit the entire object when it changes, then replicate the object multiple times if they are replicating it to multiple destinations. This exacerbates the issue by requiring even more bandwidth.

There is another way to protect your data…

NexentaEdge reduces the amount of bandwidth necessary to send the data to multiple nodes by doing two things: 1) using a multicast replication protocol that can send data to multiple nodes with a single transmission and 2) sending only portions that have been changed within an object instead of the entire object.

Replicast is our multicast replication protocol that allows us to transmit data to as many nodes as necessary without having to transmit it multiple times. An object is split into many chunks and each chunk transmitted by a single Replicast transmission to multiple destinations.
During transmission, the chunk is split into multiple UDP datagrams. Those destinations then reassemble received datagrams and compare them against a SHA-256 hash value for the chunk. This provides a much stronger check against data corruption than typical network checksums. Replicast transmission is intelligent enough to sustain the loss of UDP datagrams during chunk transmission and will retransmit only lost datagrams – hence, this greatly reduces bandwidth usage on busy networks.

Replicast also results in efficiencies when determining where to send data. Most replication systems decide which nodes to replicate on a round-robin basis without any consideration for how busy an individual node happens to be. Replicast uses multicast datagrams to dynamically select the storage servers for each chunk.

Instead of deciding amongst all servers, Replicast looks up a multicast group that is a subset of all nodes, then those nodes dynamically select which nodes will ultimately store the chunk. This allows fewer nodes to serve a higher number of IOPs by ensuring that the most available nodes are the ones that will receive new data, and other nodes that are congested may be given a break.

Splitting each object into chunks also gives us the ability to use cryptographic hashes to identify which chunks have changed and which haven’t. To store a new version of an object, we only need to replicate the chunks that have changed and update the metadata for that object. This results in significant bandwidth savings over alternative approaches.

Conclusion
Replication is playing a much greater role in day-to-day data protection, and there is significant room for improvement in its efficiency. The combined use of Replicast, our multicast replication protocol, and only transmitting changed chunks for new object versions is about as efficient as replication can get.

Tagged with: , , , , , , ,
Posted in Software-defined data center, Software-defined storage

Three Dimensions of Storage Sizing & Design – Part 3: Speed

By: Edwin Weijdema

No-speed-limitIn this third post of the multi-post Three Dimensions of Storage Sizing & Design we will dive deeper in the dimension: Use and specifically the application Workloads part characteristic Speed. Depending on the application workload requirements, you will need to size the storage system for The Need for Speed. So let us dive deeper into IOPS.

Speed with computer storage devices like hard disk drives (HDD), Solid State Drives (SSD) and Storage Area Networks (SAN) is expressed in Input/Output Per Second (IOPS). Applications will interact with storage for retrieving and storing data.

Speed_Latency_TurboBoost

What are IOPS?

IOPS are expressing the performance a storage device can achieve with completing read and write commands in a second. We will look at a simplified example for how many IOPS a disk can deliver within the boundaries of physics. These are called the back-end IOPS. Simply put how many IOPS can the disk(s) in the back-end deliver. Every disk in your storage system has a maximum IOPS value that is based on a formula, namely:

IOPS = 1000ms / (Average Seek time in ms + Average Latency in ms)

  • Rotational speed is measured in revolutions per minute (RPM). How fast the disk platters are rotating inside the disk.
  • Average Latency is the time taken for the platter to undergo half a disk rotation. Why half? Well at any one time the data can be either a full disk rotation away from the head, or it might already be right at it. The time taken for a half rotation therefore gives us the average time it takes for the platter to spin round enough for the data to be retrieved. To calculate the average latency take the RPM and use the following formula: Average latency = ((60 seconds / RPM speed)/half rotation 2)*1000ms.
  • Average Seek time is the time in milliseconds (ms) it takes for the disk’s head to position itself over the track being read or written. When the I/O request comes in for a particular bit of data, the head will not be above the correct track on the disk. The arm will need to move so that the head is directed over the correct track where it must then wait for the platter spin to present the target data beneath it. As the data could potentially be anywhere on the platter, the average seek time is time taken for the head to travel half way across the disk. There are both read and write seek times; by taking the average of those two values give you the average seek time.

Theoretical IOPS for Calculation Sizing

For disk (HDD and SSD) I normally use the numbers in the following table as a rule of thumb to calculate raw backend random IOPS from disks. Raw? Yes RAW, because we don’t factor in caching/buffering in the whole chain (disk, controller, head, adapter, hypervisor, operating system, application, workstation), RAID influence, interface connection, driver configurations, queue depths, etc. But lets keep this simplified to have a workable model to calculate the IOPS from the backend easily.

Disk RPM IOPS ~ Average IOPS range Average Latency (ms)
5.4K HDD 55 50-65 5,6
7.2K HDD 80 75-100 4,2
10K HDD 120 120-140 3,0
15K HDD 180 175-210 2,0
SSD (WI) 40000 65000-115000 0,1**
SSD (MU) 8500 7000-25000 0,1**
SSD (RI) 3000 2000-15000 0,1**
BSSD (ICE) 20000 5000-85000 0,1**

 

** Latency on SSD is not because the spinning of the disk (duhhh no mechanical moving parts in here, latency on SSDs is on average 0.03ms!), but the network chain between the processing and the SSD. For more insight in different SSDs and their IOPS number take a look at this wiki page.

Front-End versus Back-End IOPS

Often you will hear that an application will need 2.400 IOPS and that it communicates with 64K block size. Like for instance Microsoft SQL Server 2014 OLTP Log files. These are the so-called Front-End IOPS, but to let that data land on the Back-end storage systems they generate Back-End IOPS.

Size does matter!

Lets compare a storage system that stores the data with a block size of 128K and a storage system with 4K blocks.

FrontEnd_BackEnd_IOPS

We will see that it will require 15 disks and 1.200 IOPS in the back-end when the storage system can store with 128K blocks. If you store this same I/O stream on a storage system that utilizes 4K blocks you will need a whopping 38.400 IOPS. To back those incoming IOPS you will need to run with 480x 7.2K NL-SAS disks!!!

Like most applications you can have different front-end IOPS and block sizes per workload, even within one application. Microsoft SQL 2014 server is a good example that uses a different disk access pattern as shown in the following table:

Typical SQL Server Disk Access Patterns

Info: Oracle DB default block size is 8K, and the Hadoop default block size is 64M.

Ok now we know that size does matter but why should you care? Most SQL Server performance issues in (virtual) environments can be traced to improper storage configuration. SQL Server workloads are generally I/O heavy, and a misconfigured storage subsystem can increase I/O latency and significantly degrade performance of SQL Server. Running Microsoft SQL server on VMware? Checkout this valuable resource about Architecting Microsoft SQL Server on VMware vSphere.

Playing the Tetris Storage Block gameStorage_Tetris

With virtualization we introduce another layer into the I/O path. So what about all those different layers? If you look at a complete stack in a datacenter you could see an Application running on an Operating System that is installed within a VM. This VM runs on a Hypervisor, where this hypervisor talks to a storage backend and eventually lands on spinning disk and/or flash.

Example:

  1. Application – Microsoft SQL server – Logs 64K
  2. Operating System – Microsoft Windows 2008R2 – 4K
  3. Hypervisor – VMware vSphere 6 (ESXi) – runs VMFS-5 data store – 1M Unified Block
  4. Storage – NexentaStor 4 – 128K Block
  5. Disk – SanDisk BSSD – 16K block with a 4K Sector size

HDDs have a block size of 512 Bytes or 4K. With the up rise of flash we see the Block size being increased. Looking at the SanDisk – Board Solid State Drive (BSSD) card with its ICE chips gives the flash card 8TB storage capacity, which has a 16K physical block size (virtual page) and a logical 4K or 512 bytes sector size.

Block Size versus Sector Size

While sector specifically means the physical disk area, the term block has been used loosely to refer to a small chunk of data. Block has multiple meanings depending on the context. In the context of data storage, a filesystem block is an abstraction over disk sectors possibly encompassing multiple sectors. Most file systems are based on a block device, which is a level of abstraction for the hardware responsible for storing and retrieving specified blocks of data, though the block size in file systems may be a multiple of the physical block size.

Determining block size while formatting the file system in an OS is a case of tradeoffs. Every file must occupy at least one block, even if the file is zero bytes, so there’s something for the file’s metadata to be attached to. Unless you can guarantee that your files will ALWAYS be some multiple of the block size in size (e.g. in a 4k block OS, all files are 4k), there’ll be a certain amount of wastage for the files that don’t exactly fit within that block.

It’s all about Balance

Going with small block sizes is good when you need to store many small files. On the other hand, more blocks is more metadata, so you end up wasting a portion of your storage system on overhead, tracking the location of all the files. On the flip side, large blocks means less metadata, but also mean greater wastage when you’re storing small files. e.g. a 4 byte file stored in a 4k block wastes 99% of that block.

Sectors are an obsolete concept in modern drives. They existed when “locations” on a drive were specified by the old CHS (cylinder, head, sector) definition, which wasted a lot of space. All modern drives use LBA – logical block addressing, so sectors don’t really exist anymore. However, an OS can still chain multiple blocks/sector into a single logical OS-level block to reduce space. E.g. “every 8 real blocks/sectors on the drive will be considered 1 block by the Operating System”. A sector is a unit which is normally 512 bytes or 1K, 2K, 4K etc. depending on hardware.

When the logical block size of a drive is not in multiples of 512 bytes, geometry information is not available because the file system does not support other block sizes. Linux does not discover such drives, and Windows shows such drives in disk management, but does not allow you to execute any operations on them.

Partition Alignment

Aligning file system partitions is a well-known storage best practice for database workloads. Partition alignment on both physical machines and VMFS partitions prevents performance I/O degradation caused by unaligned I/O. An unaligned partition results in additional I/O operations, incurring penalty on latency and throughput. Lets look at the example we used before with the SanDisk BSSD, this BSSD uses 16KB Blocks and 4K sector sizes. For protection of the data and increasing the total amount of IOPS and throughput we stripe/mirror several cards together.

Properly Aligned

DiskAlignment.png

If a workload with blocksizes is properly aligned you will see that it looks like a Tetris game filling up and because of striping over more than one disk in storage systems you really want to get the most out of it.

Mis-Aligned

DiskAlignment_Mismatch.png

When for instance a block size of 17K is chosen for the stripe it means that you will have all kinds of problems. The Writes and Reads will cross sectors, blocks and disks carving them up into all kind of chunks and messes the data stream up with lots of additional I/O, latency and fragmentation.

Aligning with VMware vSphere 5.x or later

vSphere 5.0 and later automatically aligns VMFS-5 partitions along a 1 MB boundary (1MB Unified block). If a VMFS-3 partition was created using an earlier version of vSphere that aligned along a 64KB boundary, and that file system is then upgraded to VMFS-5, it will retain its 64KB alignment. 1 MB alignment can only be achieved when the VMFS volume is create using the vSphere Web Client.

More to the equation

There is more to the whole equation about Speed than discussed so far. Factors like:

  • Protection levels like for instance RAID, where you get write penalties or multipliers with Read that effect the overall Speed. I will dive deeper and explore this further in the next blog part(s) around the Storage Dimension Protect.
  • Caching on several levels in the whole I/O data path like Application, OS, Hypervisor, Host, HBA, Storage System, Disk, etc. which makes it much more complicated to predict the Speed and IOPS.
  • Data reduction techniques, which I will highlight in the Storage Dimension Manage.

In the next part of Three Dimensions of Storage Sizing & Design we will dive deeper into workload characteristic Throughput (MB/s).

 

Posted in Uncategorized

Raise Your SDS IQ (6 of 6): Practical Review of Containerized SDS

by Michael Letschin, Field CTO

This is the sixth of six posts (the last one was Virtual Storage Appliances) where we’re going to cover some practical details that help raise your SDS IQ and enable you to select the SDS solution that will deliver Storage on Your Terms. The sixth (and final) SDS flavor in our series is Containerized.

Containers are a relatively new entrant on the storage scene – and they’re hot, because unlike virtual machines, they use shared operating systems; this means they’re incredibly more efficient than hypervisors in terms of system resource usage. The big benefit is that you can get more apps (think four to six times more) on the same old servers, and you can run those apps basically anywhere. Because the container space is virtualized, storage via Containers could be considered SDS. For storage, the containerization approach varies: It’s either all local storage as in the diagram on the left, or, as on the right, external components sharing file, block, or object presentation that gets integrated into container as staple storage. You can use solutions like Flocker to get stateful storage, which is important because not every app is completely stateless.

Containerization is useful for testing in DevOps or for use in hyperscale environments; and the storage is highly portable, which means flexibility is high. Currently few enterprises are actually moving production apps into these environments, with key issues being the ability of apps to write to containers and limited (but growing) knowledge of IT staff on this new virtualization paradigm. Containers aren’t designed to scale up to accommodate a lot of storage—which enterprise apps usually need – but it may be that a solution to this emerges.. You’ll notice that we’ve let off the grades for Performance and Cost Efficiency. For Performance, because Containers run in a virtual environment, there are far too many variables to provide a standard ranking; for Cost Efficiency, again, lots of dependencies on the underlying environment, although the ability to use existing infrastructure is a big plus.

Overall grade: (soft) B – incomplete grading

See below for a typical build and the report card:

Screen Shot 2016-07-05 at 12.51.23 PM

Tagged with: , , , , ,
Posted in Raise Your SDS IQ, Software-defined data center, Software-defined storage, Uncategorized

Three Dimensions of Storage Sizing & Design – Part 2: Workloads

By Edwin Weijdema, Sales Engineer, Nexenta

In this second post of the multi-post Three Dimensions of Storage Sizing & Design we will dive deeper in the dimension: Use and specifically the application Workloads part. By developing an understanding of the different kind of workloads and their characteristics will enable you to look at your different workloads and determine what impact that will have on the design and sizing. Knowing, understanding and classifying which applications will run as a workload interacting with the storage, gives you an important puzzle piece.

Knowing workloads

Applications have been created to support us and automate processes to work efficient and swift with the available data. Today we use, protect and manage an overwhelming amount of data that is being transformed into information through all kinds of applications. Most of these applications also interact with our storage systems. Overlooking the divers application landscape and how they interact with the storage systems(s), we can organize them in several categories where the most common ones are:

Transactional Enterprise Applications – are used to work with data that triggers an internal or external event or transaction that takes place as an organization conducts its business. A lot of people also like to simplify this by calling it the database backend. For example an online ordering system, where orders are being entered through a web interface, and than stored in a database.  (e.g. SQL, Exchange, CRM, e-Business, Oracle, SAP, PostgreSQL, Sharepoint, Call Centre Systems, etc.)

Virtualization – translates the physical hardware and operating system the application runs on and creates a virtual machine in the form of a few files gathered in a folder. Most organizations are using Virtual Machines and/or Virtual Desktops to run their applications. This is a great way to consolidate infrastructures onto a few physical machines, reducing costs and making infrastructures more agile and flexible. By consolidating the different applications, servers and desktops onto a virtual environment also consolidates and changes the I/O data path to and from the storage. (e.g. VMware vSphere, VMware Horizon, Microsoft Hyper-V, Citrix Xenserver, Cloudstack, Openstack, etc.)

Generic Enterprise Fileservers – stores small to large files from all kinds of applications. Often you will see documents, pictures, media files and such. Any sort of file saved by applications and usually accessed over SMB/CIFS/NFS protocol.

Back-up & Archive – back-up and archiving systems have two distinct and complementary functions within an enterprise: backup for high-speed copy and restore to minimize the impact of failures, human error or disaster; and archiving to effectively manage data for retention and long-term access and retrieval. (e.g. Veeam, Commvault, Veritas, etc.)

Pic1 blog

Understanding workloads

Understanding workloads and specifically the Disk I/O pattern of enterprise applications helps tremendous with designing and sizing your storage solution(s). So you can make sure that your users, with their applications, are being optimal supported by the storage system. Vendors of enterprise application software often do not inform users about their workload characteristics. This is because the same application may generate different workload patterns depending upon the user’s configuration, or they simply do not know!

Workload characteristics

In this post we will just do a high level overview of the different workload characteristics and the relationship between them. In the next post we will dive deeper into the technical background of speed, throughput and response.

pic2

Workloads have several characteristics that define the workload type:

  • Speed – measured in IOPS (Input/Output Per Second), defines the IOs per second. Read and/or Write IOs can be of different patterns (for example, sequential and random). The higher the IOPS the better the performance.
  • Throughput – measured in MB/s, defines data transfer rate also often called bandwidth. The higher the throughput, the more data that can be processed per second.
  • Response – measured in time like ns/us/ms latency, defines the amount of time the IO needs to complete. The lower the latency the faster a system/application/data responds, the more fluid it looks to the user. There are many latency measurements that can be taken throughout the whole data path.

 Random versus Sequential access pattern

While looking at your applications and how they access their data gives you a good indication if the access pattern is random or sequential. Sequential access means all data blocks will be accessed/written after each other so 1 > 2 > 3 > 4 > 5 where Random access can mean 5 > 1 > 3 > 2 > 4. Accessing data sequentially is much faster than accessing it randomly, because of the way disk hardware works. With spinning disk the seek operation takes the most time in the whole I/O process. The disk head needs to position itself at the correct disk platter to access the requested data. Randomly reading data takes a larger number of seek operations than sequential reading, meaning that the throughput with random will be much lower than sequential. The same applies to random writing. Examining the used workloads might be useful for designing and sizing the storage system. You could use for instance Lakeside Systrack software to give a good indication in the way the workload runs and interacts with the storage.

pic3

I/O Size in the mix

The amount of throughput we can achieve is dependent on the pattern to be random or sequential and the I/O size that is being issued. For a workload with an I/O size of 4k block-size you can calculate the throughput by multiplying the number of IOPS times the I/O size.

Throughput in MB/sec = (IOPS x I/O size) / 1024

So 10.000 IOPS with an 4k block-size will be (10.000 x 4k)/1024 = 39.06 MB/sec throughput you can achieve. The whole data path should be addressed too get a clear insight and more accurate number though!

Looking at a random workload you will see that latency will start to kick in which will reduce the number of IOPS that can be achieved by the storage so random IOPS can reduce the amount of throughput significantly. A SSD produces much more IOPS and throughput against very low latency, because it does not contain moving parts. So why not go all flash than? Its all about knowing and understanding your workloads and deciphering if you need that amount of power against the cost associated with it. Balancing costs against requirements is the best way to go forward.

Classifying workloads

Lets start classifying the different workloads we identified, with the different characteristics that are important for storage. Looking at the four most common workloads:

Transactional Enterprise Applications

Transactional Enterprise Applications often are backed by a database where response is key. Looking at an Oracle database you will see that by decreasing latency will reduce the Wait states and by decreasing the Wait states the usage of CPU capacity will be reduced. If you are running virtualized Oracle database you might want to consider reducing latency and free up CPU cycles so you can run more database on that same CPU, therefore reducing your Oracle licensing cost.

Transactional Enterprise Applications tend to have a high amount of small random read I/O and a desire for fast response (low latency).

Virtualization

Virtualization is a unique workload because it consolidates several different application workloads with their corresponding characteristics on the hypervisor. So you will have several applications doing for instance random 4k, 8k and 16k blocks within the VM but the hypervisor absorbs those in memory and transforms them into 1MB or 2MB blocks which it sends and requests from the storage. VMware ESXi with VMFS5 uses 1MB unified blocks, while Microsoft Hyper-V uses 2MB blocks.

Virtualization and its consolidation also bring some storage benefit because not every application workload is going to peak at the same time. So rather than over provision it is better to look at the total picture.

Virtualization workloads tend to be fully random with a desire for speed (IOPS).

 Generic Enterprise Fileserver

Generally a File Share is used by lots of users in an organization and different applications storing their data on those file shares. Users will open a file and work in/with it for some time before saving it to the file share again. So throughput to get and save the file for the user is often most important. But depending on the applications storing files it can also be that speed is required in terms of IOPS.

Generic Enterprise Fileserver workloads tend to be often sequential, but can be random with a desire for throughput (MB/s). 

 Back-up & Archive

Back-up & Archive workloads look like a workload that would benefit the most by maximizing throughput. This is really depending on how you use the application and which options you configure.

Lets look at for instance Veeam, the performance of many Veeam Backup & Replication functions, such as Reverse Incremental, Synthetic Full, SureBackup and Instant Restore are most impact by the ability of the storage array to deliver random IOPS.

Veeam Backup & Replication offers two primary modes for storing backups to disk, Forward and Reverse incremental. Due to the differences in how these modes write backups to disk they have very different storage I/O profiles.

Forward Incremental backups offer the advantage that they perform only sequential writes to the target repository meaning that performance is significantly higher than reverse incremental backups. However, this backup mode does come with costs, primarily in the requirement to schedule periodic full backups. These backups will take additional time to create and, based on the method, addition space on the target repository.

There are three options for creating new full backups:

Synthetic Full — this method uses the most recent full backup, and any incremental backups created since then and builds a new full backup using this data. This requires space for a new full backup, and random I/O on the target repository and can take a long time to process.

Synthetic Full w/Transform — this method uses the most recent full backup, and any incremental backups created since then and builds a new full backup using this data, while converting the incremental backups to reverse incremental files. This requires only a small amount of additional space on the target repository, but usage a large amount of random I/O and can take a very long time to process

Active Full — this method simply runs a new full backup by reading all data from the source VMs. This requires I/O on the source storage, enough space to store a new full backup and sequential write I/O on the target repository.

Back-up & Archive workloads tend to be often sequential, but can be fully random with a high desire for throughput (MB/s) and speed (IOPS).

Summary

Interestingly, workload characterization is not only useful for sizing and designing storage systems but also useful for application developers to help them optimize their I/O routines or to document best practices based on such analysis. In the next part of Three Dimensions of Storage Sizing & Design we will dive deeper into workload characteristics speed (IOPS), throughput (MB/s) and Response (ms).

 

Posted in Uncategorized

Raise Your SDS IQ (5 of 6): Practical Review of Virtual Storage Appliances

By Michael Letschin, Field CTO

This is the fifth of six posts (the last one was Hyperconverged “SDS”) where we’re going to cover some practical details that help raise your SDS IQ and enable you to select the SDS solution that will deliver Storage on Your Terms. The fifth SDS flavor in our series is Virtual Storage Appliances.

One of the lesser known flavors of SDS is the Virtual Storage Appliance (VSA); it’s less common because it requires a virtual machine environment (think Microsoft’s Hyper-V or VMware), and because there are only a few, software-only options for it (like FreeNAS, LeftHand VSA, or Nexenta’s NexentaStor).   There also tend to be feature limitations as a result of the virtual machine environment, such as lack of replication across hosts. That said, for the right use case – such as remote offices, branch offices, small and medium-sized businesses, and multitenant apps – a virtual storage appliance (VSA) can offer a cost-effective storage solution. To use the VMware example, multiple head nodes each connect with vSphere and a VSA to present storage as one or more pools, with data from one or more VSAs.

Virtual storage appliances offer excellent flexibility through the hypervisor, because you can choose which one you’d like. The hypervisor host does confine scalability to VM size, that’s an important limitation. While managing individual VSAs across different hosts can bring some challenges, it also supports flexibility. You can spin up VSAs per group or per application to create multitenancy using VSAs. IT maintains control, even though individual teams might think they’re managing their storage pools. This approach offers good performance: even a loaded hypervisor only has minimal impact. And finally, this model doesn’t require any additional hardware, so there’s a nice cost benefit.

Overall grade: C

See below for a typical build and the report card:

Screen Shot 2016-06-21 at 1.58.47 PM

Tagged with: , , , ,
Posted in Raise Your SDS IQ, Software-defined data center, Software-defined storage, Uncategorized

Raise Your SDS IQ (4 of 6): Practical Review of Hyperconverged “SDS”

by Michael Letschin, Field CTO

This is the fourth of six posts (the last one was Scale-out) where we’re going to cover some practical details that help raise your SDS IQ and enable you to select the SDS solution that will deliver Storage on Your Terms. The fourth SDS flavor in our series is Hyperconverged “SDS”.

Hyperconverged systems are the subject of much industry hype and analyst debate. Some consider hyperconverged systems to be a form of SDS, others keep them out of the category for not having software-only options. What to know: a hyperconverged system is a single integrated hardware and software system comprising multiple head nodes that present all storage as one virtual pool (think Nutanix or VMware’s EVO Rail). This means that some of the software-only SDS benefits – like flexibility and cost effectiveness – are severely limited.

That said, because it’s fast and easy to set up and drop in a hyperconverged system, it’s a good choice for branch offices or green-field deployments, where there are no existing storage systems to integrate with. Hyperconverged systems are somewhat of a “black box”– meaning you’re not going to have access to software to tune – but you can dial up the performance by increasing the number of nodes.

The downside of Hyperconverged “SDS” is that it’s difficult to impossible to change the system later. Hyperconverged “SDS” provides building block only. You buy what the vendor is selling, which narrows your options. Because you’re tied to a vendor and their pricing models, cost efficiency is also limited. Plus, you’ll need to buy equal amounts of storage and compute capacity. Unless you’re an organization where requirements for storage and compute capacity scale in perfect step, this means you’ll end up with too much of one or the other, wasting part of your investment.

Overall grade: C

See below for a typical build and the report card:

Screen Shot 2016-06-14 at 10.19.33 AM

Tagged with: , ,
Posted in Raise Your SDS IQ, Software-defined data center, Software-defined storage

Raise Your SDS IQ (3 of 6): Practical Review of Scale-out

This is the third of six posts (the last one was Scale-up Software-Only “SDS”) where we’re going to cover some practical details that help raise your SDS IQ and enable you to select the SDS solution that will deliver Storage on Your Terms. The third SDS flavor in our series is Scale-out.

Scale-out is a fundamentally different approach from scale-up; with Scale-out, multiple head nodes can be attached over the network to dramatically increase scalability. This is a broad category, and solutions for it could be either vendor-defined / hardware based (think EMC’s Isilon), or software-only (Nexenta’s NexentaEdge); while we’d consider the software-only approach to be the SDS version, the technical benefits of either type of scale out are similar. You use low-latency networking to connect as many nodes as you want and form a cluster that provides storage services out to applications as unified name space.

The scale-out approach works well for Archive or Web 2.0 applications use cases. Scalability is top notch, because you can start small and grow just by adding nodes. While it provides the performance needed to handle huge capacities, there’s an important dependence on the network – the quality of your gear will significantly impact performance, because of the amount of communication between nodes; that may mean that your IOPS aren’t great

The flexibility of scale-out SDS is generally good but currently offers limited protocol support. Often the maturity of the platforms themselves limits your flexibility; for example, you can’t use Exchange to write to an object back end. Likewise, object-oriented applications won’t work with some back ends, either. Protocol support considerations also impact the cost effectiveness of Scale-out: they may restrict your hardware choices and lock you in to more expensive purchases.

Overall grade: B

See below for a typical build and the report card:

Screen Shot 2016-06-07 at 11.26.20 AM

Tagged with: , , , , ,
Posted in Raise Your SDS IQ, Software-defined data center, Software-defined storage

Lenovo Selects Nexenta for Lenovo StorSelect Program; Introduces Joint OpenSDS-Based Enterprise Storage Solution

Earlier today, you may have heard about Lenovo’s StorSelect Program, which allows customers to confidently deploy new storage technologies using pre-loaded appliances with software from best-of-breed, Independent Software Vendors (ISVs) on proven Lenovo systems. The StorSelect solutions enable easy expansion for scalable deployments, as well as integration with existing storage infrastructures. The program will also offer quality, engineering expertise and access to Lenovo’s worldwide global services.

In March 2016, we announced our partnership with Lenovo as part of this program with a shared initiative to drive broader adoption of next-generation software storage solutions. We introduced a joint solution that integrates our award-winning, software-defined storage (SDS) with Lenovo x86 servers: The Lenovo Storage DX8200N powered by NexentaStor.

The Lenovo Storage DX8200N powered by NexentaStor allows you to accelerate and simplify unified file and block storage deployments. At the same time, this pre-validated turnkey solution provides easy scalability and simplified management at a fraction of the cost of legacy systems—without trade-offs in availability, reliability or functionality.

The integrated solutions will offer Lenovo customers worldwide, both scale-up and scale-out reference architectures on all flash, hybrid and spinning media systems. These solutions with Lenovo’s industry leading service and support will deliver a new SDS standard.

Target use cases for the Lenovo Storage DX8200N powered by NexentaStor solution include:

  • Unified File (NFS and SMB) and Block (Fibre Channel and iSCSI) services
  • VMware cloud backend storage
  • OpenStack and CloudStack backend storage
  • Generic NAS file services and Home Directory storage
  • Near-line archive and large scale backup repositories

“Lenovo recognizes Software-Defined Storage as the strategic storage solution to deliver revolutionary economics for enterprise and cloud storage needs,” said David Lincoln, General Manager of the Storage Business Unit at Lenovo. “We have selected Nexenta as a key partner for our StorSelect Program and are leveraging Nexenta’s deep storage software R&D skills and Lenovo’s proven enterprise servers to deliver an innovative, trusted, and scalable enterprise storage solution. Lenovo Storage DX8200N powered by NexentaStor will give customers the freedom to scale their data to meet business needs today and tomorrow.”

While pricing will vary on capacity and media types, you can expect affordable cents per gigabyte pricing for complete systems including three years of technical support.

Additional Resources:

Lenovo Storage DX8200N powered by NexentaStor data sheet

Lenovo StorSelect video

Lenovo StorSelect 3D tour

Lenovo StorSelect analyst paper

Lenovo and Nexenta: Strategic Partnership to Redefine Open Software-Defined Storage Solutions webinar

Lenovo StorSelect blog

If you’re planning to attend Lenovo’s Tech World conference on June 9, 2016 in San Francisco, please visit the Data Center Group showcase at the Fairmont to see demos of the new Lenovo Storage DX8200N powered by NexentaStor solution.

Posted in Uncategorized

Raise Your SDS IQ (2 of 6): Practical Review of Scale-Up Software-Only SDS

by Michael Letschin, Field CTO

This is the second of six posts (the last one was Scale-up Vendor-Defined “SDS”) where we’re going to cover some practical details that help raise your SDS IQ and enable you to select the SDS solution that will deliver Storage on Your Terms.  The second SDS flavor in our series is Scale-up Software-Only SDS.

Scale-up Software-Only SDS is just that – SDS benefits delivered via a software-only approach; the software sits on industry standard servers (think Cisco, Dell, Supermicro) and can leverage a variety of JBODs (like Fujitsu, Supermicro, Quanta).  The end result is similar to what you get with Scale-up Vendor-Defined “SDS” — one or two commodity head nodes with JBOD behind it – but with two big differences: much better cost efficiency because it’s more vendor agnostic, and greater opportunity for scale up by leveraging big disks for the JBODs (like Supermicro’s 90-bay). Both provide REST-based management with rich APIs via front-end software, to enable easy, automatic provisioning and management of storage. Unlike Vendor-Defined, Scale-up Software-Only SDS offers good scalability and excellent cost efficiency and flexibility because of it’s software-only approach.

Scale-up Software-Only SDS is an excellent option for companies running virtual machines or the legacy apps that help run their business. It’s a good choice if you have accounting applications and legacy hardware because its performance in this use case is excellent. Unlike vendor-defined options, this one gives you the flexibility to design your solution as needed; and that applies both to the solution you need today, as well as creating flexibility for the future.. Plus, you can choose the most cost-effective commodity-priced machines rather than paying a premium for proprietary hardware – we’ve seen as much as 50-80% cost difference between identical solutions of the different types.  Scale-up Software-Only SDS is limited to one or two head nodes, but you can put larger drives behind them, so you can scale up pretty big, but obviously scale out is limited.

Overall grade: A-

See below for a typical build and the report card:

Screen Shot 2016-05-24 at 10.11.24 AM

Tagged with: , ,
Posted in Raise Your SDS IQ, Software-defined storage

Is 2016 the year of the software-defined data center? Part 2

In the last blog we determined that the world has changed since the early 2000s and that x86 has won the server war, now giving you a chance to have vendors compete for you. But that’s only part of the data center — we still have to look at the rest of the infrastructure.

Networks that can shift and deliver at the same time

In the past, if you wanted to have the fastest network infrastructure you had to buy from one or two specific vendors. In reality, they still have a huge share of the networking market, but we’ve seen a transition in recent years.

That being said, it is not just about the software layer — with scale-out systems, the network has become critical for more than just inter-rack and inter-data center connectivity, and critical for processing power and speed as well. The move from 1GB to 10GB to 40GB, and even up to 100GB, is allowing for larger amounts of traffic than ever before to move around the globe. This is all done at a scale that can give even the smallest enterprise access to faster connectivity, but it still comes at a cost.

While 10GB is starting to become commonplace, it is still a much more expensive option than 1GB and not something that has trickled down to the end user yet. The true question is, does it need to? If the data center is driven by commodity hardware that is governed by software and all the compute happens there, do we really need to be faster at the endpoint? To answer that, we will have to wait and see.

The brains behind the brawn

The No. 1 buzzword of the last few years has been “software-defined,” and rings true for the transition we have seen in the data center over the last 10 years. The first step is compute, with virtualization taking its place as a stalwart to provide the processing power needed on commodity hardware.

The next two parts that need to move to the software-defined space to allow for a commodity-driven data center are storage and networking. Some would argue that we are already there, with executives from major storage vendors making statements as early as 2010 that “we all run on commodity hardware.”

In reality, you are normally limited to the hardware that a vendor chooses. Moving into a software-defined model means we have choice in vendors and can truly liberate the storage needs from the hardware vendor’s grasp. The growth of open source-based solutions has helped this along the way, and we will see if 2016 is the year we take the leap to make software-defined the norm.

Networking, on the other hand, is the last step forward. You cannot buy a blank switch off the shelf and just layer on the software of your choosing and make it work at the speed and capacity that you desire, but we are starting to see a transition toward this. The combination of software-defined solutions is what gives companies the agility to transition to the enterprises of tomorrow, providing the ability to custom-fit a solution into the business instead of the other way around. It appears that software-defined is more than just a buzzword.

So is it the year?

My gut says the commodity-based data center is not ready to go mainstream, but for intrepid companies willing to source the right gear, it may be. We have seen so many changes in the data center since the ’90s and the early 2000s that I have to believe the pace of change will continue.

Before we cross into 2020, the data center will be in the domain of the business again, not the vendors. That will be the commodity data center that anyone who has managed a data center in the last 20 years has been dreaming about.

Posted in Software-defined data center, Software-defined storage