VMware View 4.5 real-time usage statistics

January 4, 2011 3 comments

I was somewhat disappointed by the lack of usage information from View 4.5. Specifically that I had to login to the admin side of a connection server to find out how many machines were in use out of a given pool.

The data that View stores is in ADAM and in searching the net, I’ve not found a good way to query that data. What I’d like to do is have a webpage that shows how many desktops are in use or available for a given pool.

I had already started parsing the View Event database for historical statistics (users per day, average length of time logged on, busiest days etc) but nowhere in there was an easy method to just go “hey, there are 75 people on right now”

What I came up with (this may or may not work in your environment) was to put an AFTER INSERT  trigger on the viewevent table. The trigger either increments or decrements a counter in a table on a remote-linked server that is accessible to my web server.

At this point I am not looking at specific pools, only overall usage. There are a few event types in the log table that you can use and it all depends on which one you want to pick as the definitive “a user logged on” event for the trigger:

  • AGENT_CONNECTED
  • BROKER_DESKTOP_REQUEST
  • BROKER_USERLOGGEDIN

For me, the most obvious seemed to be to base counts on the BROKER_USERLOGGED -IN and -OUT event types. The issue with these is that in the record, neither the pool nor the desktop are listed. Only the userid of the person logging in and the name of the View Connection server. If you based your trigger on the “AGENT_CONNECTED” event type, you have the name of the user and the name of the responding desktop. If you use BROKER_DESKTOP_REQUEST then you have the name of the pool.

I didn’t want to make my trigger complex at this point so I am just getting an overall/aggregate usage for all pools.

Basically my trigger is as follows (note, I removed the linked server name and some other specific data, this code is NOT guaranteed to run as-is or at all):
set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go
--written by Paul Dunn
--(C) 2011
--will increment/decrement a counter for view usage based on event type
--in the viewevent table
--dunnsept@gmail.com
ALTER TRIGGER [test1]    ON  [dbo].[viewevent]  
  AFTER insert
AS
declare @etype as varchar(50),
@count as int
BEGIN
 -- SET NOCOUNT ON added to prevent extra result sets from
 -- interfering with SELECT statements.  
SET NOCOUNT ON;  
select @etype = [EventType] from inserted  
select @count = (select num from view_usage where poolid = 1)  
 if @etype = 'BROKER_USERLOGGEDIN'
 BEGIN   
update view_usage set num = (@COUNT + 1)
 END
 ELSE
  if @etype = 'BROKER_USERLOGGEDOUT'
  BEGIN  
 if @count> 0   
 BEGIN      
update view_usage set num = (@COUNT - 1)  
  END  
 END  
set NOCOUNT OFF;
END

Now all I have to do it query that value in the table and display it on the page. I have the tables setup so that I can track usage by pool, which makes more sense than all users, but this is a start at least. This should also help get others off my case as I run View 4.5 pools for 3 other departments besides my own and now I don’t have to give them access to the admin side for them to see the usage on their pools.

I am going to re-write this eventually so that it will be pool-based, shouldn’t be that big of a deal, just means using a different event type in the table and parsing out the name of the pool. Could even go so far as to grab the name of the logged in user, although at this point I don’t see much use for that.

Categories: View, Vmware

VMware View 4.5 pool re-compose storage throughput

October 1, 2010 Leave a comment

I just upgraded my VMware View 4.5 RC to GA. After upgrading the Connection servers, and then composer, I updated the agent on the master image for the pool.

Keep in mind this cluster is only running on 4 servers. Somehow the bid – purchase process got screwed up and my 10 new servers never showed. Of course that pricing is now no longer valid.. waiting on 6 HP DL387s. Oh well.

So, I told View to recompose the pool at 17:53. Pool consists of 100 Windows XP SP3 as linked clones. The ESX servers are connected to a Sun 7410. 2GB aggr from servers to Cisco switches with 4GB aggr to 7410 head.

According to View MGR the complete pool re-compose took 37 minutes. The SUN shows a peak of 9,898 NFS ops/sec. I didn’t add Latency to the Analytics page until part way through, but latencies stayed well below the 15mS mark. In the screen shot that highlights the NFS Ops peak you can see how most of the latencies are at the 0uS mark (zero).. so they are between zero and 1mS.

There is still a bit of NFS traffic on there, it didn’t drop off really low as that was VCenter migrating the desktops around to re-loadbalance.

Now at 18:33 the SUN shows NFS down to < 100 ops/sec so all-in-all I’d say 45 minutes total for a 100 desktop pool re-compose. I exported the data from the SUN and a quick look shows an average of 2,849.3 ops/sec over the time the re-compose was running

Categories: Storage, View, Vmware

vscsiStats during an SAP Client Copy

March 11, 2010 Leave a comment

I gathered some vscsiStats during an SAP ECC 6.0 Client 0 copy this morning.  I did this because I wanted to not only see how my storage subsystem was handling the load, but also to test out the new version of the Excel Macro Matt and I had worked on.

This is a fresh Oracle based SAP ECC 6.0 install. There are already 2 IDES clients in place and this client copy will be for a BPI client for the summer.  Build details:

  • Windows 2003 x64
  • 16gig ram
  • 60gig C, 500gig data partition
  • Quad Xeon 2.9Ghz

All running on ESX 4.0 on a Sun Blade x6250 in a 6000 chassis, storage provided by a Sun 7410C.

As soon as  I had the job scheduled I started gathering vscsiStats on the box. I collected stats for the length of time it took for the client copy to complete; 110 minutes.

The HTML output of the Excel Macro is viewable as index4558 . Note that unlike the HTML original you can’t click on anything in the PDF.

Starting to process the data. In the first chart you can see that the mean block size is 8k

The latency is generally  between 5msec and 15msec. Keep in mind this is a combination of read and write. I think my read cache is keeping up but once in a while the writes are overrunning write cache. The storage has 200GB read cache per head and 3x 18GB write SSDs per storage pool.

At the same time I was collecting vscsiStats, I had Sun Storage Analytics open. Overall the client copy moved about 50GB of data in 110 minutes. In this view of Sun Analytics you can see the effect of the client copy with the highlighted peak at 10,341 NFSv3 Ops per sec

Overall in this graph you can see the impact of the client copy on storage, I’ve added an arrow to the time when I started the client copy job. This screenshot does not go out to the end of the copy, but traffic quiets right down when the job finished

Overall in the 110 minutes of the client copy, 30GB written, 18GB read

Categories: SAP, Storage, Vmware

New vscsiStats Excel Macro

March 11, 2010 17 comments

I wrote an Excel Macro to process vscsiStats data and turn it into pretty charts & graphs. I shared that macro with my friend Matt Kelliher and after showing him how to use it, he suggested and made a modification to it. The latest version will still process the data and create charts but then it will also export the charts as PNG, create an HTML file and put thumbnails of the charts in. You can then click on any of the charts for a full-screen view. Handy method of presenting the data in a concise format.

Let me know what you think of the file, you may download a copy here. You are welcome to download and use this macro but please leave the comments at the top (feel free to buy us a beer or two too)

This macro has been tested in Excel 2007 and 2010 beta. I must say though,  it runs much more slowly in these than in 2003.

Note: SAVE your spreadsheet first before running this version of the macro, it uses the current save location as the starting point for creating the HTML and saving the images.

Categories: Storage, Vmware Tags: , , ,

Vmware Performance

February 19, 2010 Leave a comment

I posted yesterday that I was seeing a huge amount of storage traffic on my View Manager box. Turns out that it was due to two things: a snapshot and vswp.

This morning I checked a few settings before proceeding. According to VCenter, the view manager box only had 600meg of ram allocated, but task manager on that windows 2k3 showed 2.9Gig in use and committed, mainly pagefile. Strange, why the disconnect between the two?

It turns out that when the resource pool was created for the view manager server, the reservations were never changed.  The pool had a default memory reservation of 600meg. (all set by someone else I might add). So in looking at the pool, the ESX host was only giving the view manager server 600 meg of physical ram, and was using a vswp file to make up for the rest of the 2gig allocation.

I shut the server down, removed it from the resource pool, set the reservation on the pool higher, put the server back in then set a reservation on the server itself that matched it’s allocated physical ram.

At the same time, in Vcenter I deleted the snapshot that was in place. The snap was just over 3gig so it didn’t take too long to merge it back in.

After all of this finished, I went back into the Sun storage analytics. What an amazing change!

You can see shaded in yellow the combined traffic to storage just from the view manager’s delta and vswp files. The data point highlighted in the graph shows 3122 ops per sec to the swap file and 225 to the delta.

Notice how the overall NFS traffic drops pretty drastically after about 9:25.  The vswp file is empty now and the machine’s task manager shows very little page file usage.

VMware / Sun analytics

February 19, 2010 Leave a comment

I was going to do a post about vscsiStats processing in excel just in and of itself. But today an opportunity presented itself that I hope to be able to exploit.
Seemed like our Vmware View Manager was dog slow. Boss was complaining at me about it. I thought, OK, this is an opportunity to gather some vscsistats and process them to see what’s going on with the storage on this thing. As a starting point, our View Manager is a Win2k3 box, 2gig ram running on a Dell Poweredge 2950 ESX4. Backend storage is provided via NFS from a Sun 7410c.
First thing I did was login to ESX with putty, clear vscsiStats and then start gathering statistics. I collected for 60 minutes, exported the data and then processed it in Excel.

hmmm. interesting:

IO Lengths

IO Lengths

Looks reasonable enough, let’s look at read and write throughput as reported by vscsiStats:

hmmm. 215K average total throughput to storage. That’s not enough to cause the performance degradation that we are seeing. Another bit of calculation in Excel showed me that the View Manager was averaging 22IOPS. Again, not enough to cause any of the problems we are having with it.

So, I then logged into the Sun 7410 head that runs NFS for this ESX cluster. A few minutes of time looking at the analytics on the sun and I had my answer.

The green shaded area is NFSv3 ops per second to/from the view manager. Yep, take another look 6750 ops per second in that spike. I left the other boxes listed in there although their names have been hidden to protect the innocent. The next highest is 530 IOPS for an SAP server, then 294 and 92 for more SAP servers. One of my file servers is next at 61.

Clicking on the list for View Manager to open the hierarchy, I saw that there were two files listed that made up that 6750 ops, one of them named …delta.vmdk and one of them a vswp file.

The vm has a snapshot and about half of the ops are going to the delta file. Then for some reason the View Manager is exceeding its allocated 2gig of ram and using the virtual swap space.

I’m not an expert at any of these tools, but it sure is nice when you can use the tools you have to track things down and find suspected causes of the problems. I will be making changes to this machine first thing in the morning to remove the snapshot and increase the ram and set a reservation so it doesn’t use a vswp file.

I’ll post again once I gather more stats to see if this has helped with performance.

Categories: Storage, Vmware Tags: , , ,