I do not want to rant for hours why and how a product that is mainly built for external customers is different from the internal one which on its strength and success gets packaged up and released (as is the case with AWS) but a consistent and working telemetry option in Azure is pretty much missing - there are bits and pieces here and there but not a consolidated story. I am informed that even internal teams within Microsoft had to build their own monitoring solutions (something similar to what I am about to describe further down). And as the last piece of rant, let me tell you, whoever designed this chart with this puny level of data resolution must be punished with the most severe penalty ever known to man: actually using it - to investigate a production issue.
|A 7-day chart, with 14 data points. Whoever designed this UI should be punished with the most severe penalty known to man ... actually using it - to investigate a production issue.|
And here, I am presenting a solution to the telemetry problem that can give you these kinds of sexy charts, very quickly, on top of your existing Azure WAD tables (and other data sources) - tried, tested and working, requiring some setup and very little maintenance.
If you are already familiar with ELK (Elasticsearch, LogStash and Kibana) stack, you might be saying you already got that. True. But while LogStash is great and has many groks, it has been very much designed with the Linux mindset: just a daemon running locally on your box/VM, reading your syslog and delivering them over to Elasticsearch. The way Azure works is totally different: the local monitoring agent running on the VM keeps shovelling your data to durable and highly available storages (Table or Blob) - which I quite like. With VMs being essentially ephemeral, it makes a lot master your logging outside boxes and to read the data from those storages. Now, that is all well and good but when you have many instances of the same role (say you have scaled to 10 nodes) writing to the same storage, the data is usually much bigger than what a single process can handle and shoveling needs to be scaled requiring a centralised scheduling.
The gist of it, I am offering ECK (Elasticsearch, ConveyorBelt and Kibana), an alternative to LogStash that is Azure friendly (typically runs in Worker Role), out-of-the-box can tap into your existing WAD logs (as well as custom ones) and with a push of a button can be horizontally scaled to N, to handle the load for all your projects - and for your enterprise if you work for one. And it is open source, and can be extended to shovel data from any other sources.
At core, ConveyorBelt employs a clustering mechanism that can break down the work into chunks (scheduling), keep a pointer to the last scheduled point, pushing data to Elasticsearch in parallel and in batches and gracefully retry the work if fails. It is headless, so any node can fail, be shut down, restarted, added or removed - without affecting integrity of the cluster. All of this, without waking you up at night, and basically after a few days, making you forget it ever existed. In the enterprise I work for, we use just 3 medium instances to power analytics from 70 different production Storage Tables (and blobs).
First of all, there is a one-to-one mapping between an Elasticsearch cluster and a ConveyorBelt cluster. ConveyorBelt has a list of DiagnosticSources, typically stored in an Azure Table Storage, which contains all data (and state) pertaining to a source. A source typically is a Table Storage, or a blob folder containing diagnostic data (or other) - but CB is extensible to accept other data stores such as SQL, file or even Elasticsearch itself (yes if you ever wanted to copy data from one ES to another). DiagnosticSource contains connection information for the CB to connect. CB continuously breaks down the work (schedules) for its DiagnosticSources and keeps updating the LastOffset.
Once the work is broken down to bite size chunks, they are picked up by actors (it internally uses BeeHive) and data within each chunk pushed up to your Elasticsearch cluster. There is usually a delay between data captured (something that you typically set in Azure configuration: how often copy data), so you set a Grace Period after which if the data isn't there, it is assumed there won’t be. Your Elasticsearch data will usually be behind realtime by the Grace Period. If you left everything as defaults, Azure copies data every minute which Grace Period of 3-5 minutes is safe. For IIS logs this is usually longer (I use 15-20 minutes).
The data that is pushed to the Elasticsearch requires:
- An index name: by default the date in the
yyyyMMddformat is used as the index name (but you can provide your own index)
- The type name: default is PartitionKey + _ + RowKey (or the one you provide)
- Elasticsearch mapping: Elasticsearch equivalent of a schema which defines how to store and index data for a source. These mappings are stored on a URL (a web folder or a public read-only Azure Blob folder) - schema for typical Azure data (WAD logs, WAD Perf data and IIS Logs) already available by default and you just need to copy them to your site or public Blob folder.
Set up your own monitoring suiteOK, now time to create our own ConveyorBelt cluster! Basically the CB cluster will shovel the data to a cluster of Elasticsearch. And you would need Kibana to visualise your data. Here I will explain how to set up Elasticsearch and Kibana in a Linux VM box. Further below I am explaining how to do this. But ...
if you are just testing the waters and want to try CB, you can create a Windows VM, download Elasticsearch and Kibana and run their batch files and then move to setting up CB. But after you have seen it working, come back to the instructions and set it up in a Linux box, its natural habitat.
So setting this up in Windows is just to download the files from the links below, unzip and then running the batch files elasticsearch.bat and kibana.bat. Make sure you expose the ports 5601 and 9200 from your VM, by creating endpoints.
Set up ConveyorBeltAs discussed above, ConveyorBelt is typically deployed as an Azure Cloud Service. In order to do that, you need to clone Github repo, build and then deploy it with your own credentials and settings - and all of this should be pretty easy. Once deployed, you would need to define various diagnostic source and point them to your ElasticSearch and then just relax and let CB do its work. So we will look at the steps now.
Clone and build ConveyorBelt repoYou can use command line:
Or use your tool of choice to clone the repo. Then open administrative PowerShell window, move to the build folder and execute .\build.ps1
git clone https://github.com/aliostad/ConveyorBelt.git
Elasticsearch is able to guess the data types of your data and index them in a format that is usually suitable. However, this is not always true so we need to tell Elasticserach how to store each field and that is why CB needs to know this in advance.
To deploy mappings, create a Blob Storage container with the option "Public Container" - this allows the content to be publicly available in a read-only fashion.
https://<storage account name>.blob.core.windows.net/<container name>/
Also use the tool of your choice and copy the mapping files in the mappings folder under ConveyorBelt directory.
Configure and deployOnce you have built the solution, rename tokens.json.template file to tokens.json and edit tokens.json file (if you need some more info, find the instructions here). Then in the same PowerShell window, run the command below, replacing placeholders with your own values:
.\After running the commands, you should see the PowerShell deploying CB to the cloud with a single Medium instance. In the storage account you had defined, you should now find a new table, whose name you defined in the tokens.json file. serviceName name your ConveyorBelt Azure service storageAccountName name of the storage account needed the deployment of the service subscriptionDataFile your .publishsettings file selectedsubscription name of subscription to use affinityGroupName affinity group or Azure region to deploy to
Configure your diagnostic sourcesConfiguring the diagnostic sources can wildly differ depending on the type of the source. But for standard tables such as WADLogsTable, WADPerformanceCountersTable and WADWindowsEventLogsTable (whose mapping file you just copied) it will be straightforward.
Now choose an Azure diagnostic Storage Account with some data, and in the diagnostic source table, create a new row and add the entries below:
And save. OK, now CB will start shovelling your data to your Elasticsearch and you should start seeing some data. If you do not, look at the entries you have created in the Table Storage and you will find an Error column which tells you what went wrong. Also to investigate further, just RDP to one of your ConveyorBelt VMs and run DebugView while having "Capture Global Win32" enabled - you should see some activity similar to below picture. Any exceptions will also show in there.
OK, that is it... you are done! ... well barely 20 minutes, wasn't it? :)
Now in case you are interested in setting up ES+Kibana in Linux, here is your little guide.
Set up your Elasticsearch in Linuxhere. Ideally you need to add a Disk Volume as the VM disks are ephemeral - all you need to know is outlined here. Make sure you follow instructions to re-mount the drive after reboots. Another alternative, especially for your dev and test environments, is to go with D series machines (SSD hard disks) and use the ephemeral disks - they are fast and basically if you lose the data, you can always set ConveyorBelt to re-add the data, and it does it quickly. As I said before, never use Elasticsearch to master your logging data so you can recover losing it.
Almost all of the commands and settings below, needs to be run in an SSH session. If you are a geek with a lot of linux experience, you might find some of details below obvious and unnecessary - in which case just move on.
|SSH is your best friend|
Anyway, back to setting up ES - after you got your VM box provisioned, SSH to the box and install Oracle JDK:
And then install Elasticsearch:
sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java7-installer
Now you have installed ES v 1.7.1. To set Elasticsearch to start at reboots (equivalent of Windows services) run these commands in SSH:
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.1.deb sudo dpkg -i elasticsearch-1.7.1.deb
Now ideally you would want to move the data and logs to the durable drive you have mounted, just edit the Elasticsearch config in vim and change:
sudo update-rc.d elasticsearch defaults 95 10 sudo /etc/init.d/elasticsearch start
and then (note uncommented lines):
sudo vim /etc/elasticsearch/elasticsearch.yml
Now you are ready to restart Elasticsearch:
path.data: /mounted/elasticsearch/data # Path to temporary files: # #path.work: /path/to/work # Path to log files: # path.logs: /mounted/elasticsearch/data
sudo service elasticsearch restart
Note: Elasticsearch is Memory, CPU and IO hungry. SSD drives really help but if you do not have them (class D VMs), make sure provide plenty of RAM and enough CPU. Searches are CPU heavy so it will depend on number of concurrent users using it.If your machine has a lot of RAM, make sure you set ES memory settings as the default ones will be small. So update the file below and set the memory to 50-60% of the total memory size of the box:
And uncomment this line and set the memory size to half of your box’s memory (here 14GB, just an example!):
sudo vim /etc/default/elasticsearch
There are potentially other changes that you might wanna do. For example, based on number of your nodes, you wanna set the index.number_of_replicas in your elasticsearch.yml - if you have a single node set it to 0. Also turning off the multicast/Zen discovery since will not work in Azure. But these are things you can start learning about when you are completely hooked on the power of information provided by the solution. Believe me, more addicting than narcotics!
Set up the Kibana in Linux
Installing Kibana is straightforward. You just need to download and unpack it:
So now Kibana will be downloaded to your home directory and be unpacked to kibana-4.1.1-linux-x64 folder. If you want to see where that folder is you can run
wget https://download.elastic.co/kibana/kibana/kibana-4.1.1-linux-x64.tar.gz tar xvfkibana-4.1.1-linux-x64.tar.gz
pwdto get the folder name.
Now to run it you just run the command below to start kibana:
That will do for testing if it works but you need to configure it to start at the boot. We can use upstart for this. Just create a file in /etc/init folder:
cd bin ./kibana
and copy the below (path could be different) and save:
sudo vim /etc/init/kibana.conf
Now run this command to make sure there is no syntax error:
description "Kibana startup" author "Ali" start on runlevel  stop on runlevel [!2345] exec /home/azureuser/kibana-4.1.1-linux-x64/bin/kibana
If good then start the service:
If you have installed Kibana on the same box as the Elasticsearch and left all ports as the same, now you should be able to go to browser and browse to the server on port 5601 (make sure you expose this port on your VM by configuring endpoints) and you should see the Kibana screen (obviously no data).
sudo start kibana