Moving data between virtual machines
Hidden Information
The easiest virtual environment scenario is based on a single physical server that hosts multiple virtual machines (VMs), but capturing the data traffic within a single physical computer is very difficult. The packets exchanged between the virtual machines on the same server never actually leave the physical server. For this reason, a physical span port on the switch is not much use for logging the data streams.
However, developers have come up with a solution for this analysis problem: a virtual switch with an integrated span port. This setup lets administrators define a network interface controller (NIC) on the virtual switch as a target for the traffic they want to log.
You can use a vNIC running on a virtual machine on the server, or you can use a pNIC to transfer the packets to an external sniffer.
Two Approaches to Monitoring
The benefit of using a VM as your packet collector on the server is that you don't need any additional hardware. The drawback, however, is that this approach generates additional data traffic on the virtual switches and possibly requires additional storage space to keep the packets that you log on disk. Network analysis is typically a passive process. Because the sniffer runs on a physical server, the data is stored on the local hard disk, which could have consequences for the entire VM server.
Alternatively, you can log the data in non-promiscuous mode within the VM itself. To do so, you need to create a capture filter on the virtual machine in which you will be logging the packets. Thanks to tcpdump
, sniffing on the Linux-based virtual machine is fairly simple. It makes sense to use the -w
option, which tells the software to write the packets it logs to a pcap file. You can then open the file with any network analysis tool and quickly and easily evaluate the results.
In practical applications, you will probably also have cases in which the packets exchanged between two virtual machines on a physical server are forwarded to a target outside of the box: Most virtual switches only work in Layer 2. This means that in routing between virtual machines, the traffic is forwarded to the default gateway between the VLANs.
Overlay Traffic
In a networked, virtual environment, distributed virtual switches are used between virtual machine servers. A virtual switch (VSwitch) works in a similar way to a physical Ethernet switch. It knows which virtual machines are logically connected to which virtual ports, and it uses this information to forward data to the correct virtual machine. A VSwitch can be connected via physical Ethernet adapters (a.k.a. uplink adapters) to physical switches, thereby connecting virtual and physical networks.
This connection is similar to networking physical switches to create larger networks. The traffic here is encapsulated; the physical server addresses are used, and the internal addresses of the virtual machines are masked. The idea behind this is to achieve more efficient networking in the virtual machine area by establishing a common VSwitch Forwarding Information Base (FIB) for all VM servers. However, this setup also means that the VM infrastructure communicates on separate VLANs, and the configuration of the physical switches does not need to be changed to support this.
Various problems occur when you need to sniff packets in this type of computing environment. To begin, you need to define which physical servers (on which the VMs are running) need to be analyzed. Then, you need to specify suitable capture filters. Because the entire data traffic between the stakeholding servers is encapsulated, you end up logging all the data traffic between the servers and not just the data for a specific virtual machine. Because many virtual machine server farms use 10Gb Ethernet, this is not a trivial undertaking.
As I mentioned previously, packet analysis with the aid of virtual switches is preferable. It lets you work around the problems entailed in extracting the packets that you are interested in during packet logging. However, you need to make sure that the packet capture program you enable in the target server infrastructure is actually capable of logging the packets quickly enough. You might find that you need an external device for logging data instead of a capture VM on each server.
With the aid of overlay functions, you can forward the data traffic that you log to a dedicated port. Alternatively, you can use a remote switch port analyzer (RSPAN) or encapsulated RSPAN (ERSPAN) to export the traffic over the physical network to your external capture hardware. If you need to perform this kind of packet analysis regularly, it makes sense to have virtual taps (TAPs) in place for port mirroring.
Packet Analysis in the Cloud
The cloud is the most extreme example of a virtual machine infrastructure, and one that makes packet analysis particularly difficult. You can basically reduce the topic of packet analysis of cloud data streams to one question: "Who runs the network?" If the virtual machines are provisioned in a public cloud (e.g., hosted by Rackspace or Amazon), you have no access to the virtual physical switches, and local packet sniffing is your only option.
Some restrictions apply here, too: Packet logging is only possible if you have an Infrastructure as a Service (IaaS) solution. In the case of Platform as a Service (PaaS) and Software as a Service (SaaS), you will not typically have access at the operating system level. In this case, you will fail in your attempts to sniff the servers, and your only option remains packet analysis on the client side.
Following all of this bad news, here is the good news: Automation of virtual machines in the cloud – in combination with a couple of scripts – lets you control a packet capture agent remotely. If you run a cloud environment in your own enterprise (private cloud), then the scenarios described at the start of this article apply to sniffing. The cloud solution that your enterprise uses might not support port mirroring on the virtual switches; however, you can easily change the status quo in a private cloud. For example, OpenStack has supported port mirroring on multiple virtual switches for some time now.
Conclusions
In virtualization, the focus is on simplifying processes and not on controlling and monitoring critical parameters. Fortunately, this situation is changing slowly: Open vSwitch as of VMware version 5 supports port mirroring, which means you can now also sniff data streams in virtual machine environments.