Last week I attended the Splunk Live! event in Amsterdam. This is an event which is organised by Splunk itself and is about learning the Splunk community about their product. Some speakers of the event Splunk CIO Doug Harr, Splunk Sales Engineer Marco Paniagua, but maybe even more interesting Splunk users Wiam Vos for Kadaster and Karl Lovink for Belastingdienst.
Splunk is an tool which collects data (any data!) of any amount, any location and any source. Since there is no upfront schema defined for Splunk you can really import any data you like. This is as Splunk tries to tell us the strength of their product. Splunk indexes all the data in receives on so called indexers and via a search-head you are able to search or view the data via a Dashboard. This can be done via basis search strings or via advanced graphs and/or apps.
Splunk can be downloaded and installed very easily. Once you installed it you can add sources an play around. Splunk indexes all the information you feed it and you can search an graph all that data in a way you like it. What is even greater is that you can use the Splunkbase to install apps. This can save you a lot of work since good apps are already developed by other people and you can use them to display specific needs. Some examples of apps to view application specific data are:
- WMware
- UCS
- Netflow
- Exchange
The default license is Free which gives you the ability to index 500MB per day. If you exceed this amount of data you need a Enterprise license. The price depends on the amount of data you index with Splunk per day. You also get some extra features like Access Control and Index Replication.
You could run Splunk in a virtual environment, but it is important to keep in mind that Splunk needs a lot of disk IO. therefor it might be better to use dedicated hardware for Splunk. To be sure the Splunk performs well on a virtualized environment you could give several Splunk virtual machines a dedicated amount CPU/memory/disk space.
For good performance it might also be wise to use separate machines for indexing and for searching. for example you could use a loadbalancer to load balance traffic between two locations, where on each side one search-head and one indexer is running. You could send data form all reporting devices to both indexers. In this way you geographical separate your data (for disaster recovery purposes) and balance the load of the servers as well, which enrich the user experience when using this tool.
In the short future I hope to do some tests with real network traffic and post some results here as well.