Similarly, we can count the number of events making into the accelerated Authentication DM with the following query: | tstats count FROM datamodel=Authentication | tstats count where index=* by index, sourcetype In practice, accessing data collected and defined within a DM is as fast as listing the number of events from all (non-internal) indexes and sourcetypes on a system (see below). For long-time splunkers in the room: that’s the infamous tsidx file, which is actually part of any index bucket. Splunk also refers to this storage area as “high-performance analytics store”. That’s how the accelerated data is created. Well, simply put: take that root search, which encompasses all DM’s objects, run it every 5 minutes and dump the results into a data storage. In this case, if user defines “mcafee” and “symantec” as the only indexes containing events relevant to fit into the Malware DM, the beginning of that expanded query would look like the following: search ((index=mcafee OR index=symantec) tag=malware tag=attack (.)Īnd that would significantly reduce the performance hit on the system. The best practice is to always define or limit the scope of which indexes should be considered for each DM. As a side note: be sure to check “Watch your TAGs” section below to know more on that. Needless to say, this behavior along with how Splunk handles tag-based queries, is one if not the main reason why some customers struggle with ES, assuming it needs more HW and other wrong claims. And that’s how many ES deployments will behave without further tuning. This implies that somehow all indexes will be touched, which impacts on performance to say the least. Note how the index constraint is dynamically built: (index=* OR index=_*). Quick note: David Shpritz ( pointed out one can also use datamodel’s command acceleration_search_string parameter to verify the exact query executed as follows: | datamodel malware Malware_Attacks search (expanded) Don’t be surprised with the lengthy queries! Depending on the configuration, it can take hundreds of lines.īelow is an example of an expanded search from the Malware_Attacks object of Malware DM: If you are curious enough to investigate deeper, check the Job Inspector “Search job properties” values, especially the eventSearch, normalizedSearch and optimizedSearch ones. ![]() To search events matching a specific DM object, for instance, the Failed_Authentication, use this: | datamodel Authentication Failed_Authentication search ![]() Side note: to better walk the DM’s config and attributes, check datamodelsimple command. To know more about a data model definition, its root search and object/nodes, check the following: | datamodel For instance, the Authentication DM has an object/node called “Failed_Authentication” where additional constraints are defined to pick failed logins attempts. That’s clever, you can also define subsets of authentication related events by splitting them up into distinct DM objects. In the end, that search will basically look like the following one: (index=win OR index=unix) tag=authentication Note that tagging is done at event-level, not on a sourcetype basis, which implies one sourcetype might hold multiple tags, therefore perhaps feeding multiple DMs. How to tell Splunk those are essentially the same type of messages? You “tag” relevant events so that they are picked up by the DM’s root search. Despite coming from different systems, those logs should fit into the Authentication DM since a user must provide at least an account and a system (subjects) he/she is authenticating against. Think about Unix and Windows authentication logs. What happens under the hood is a regular Splunk search (index=x foo=bar), also known as “root search” but with an additional spicy: tags. Let’s have a closer look at the following command: | datamodel Authentication search ![]() Without getting too “high-level” on the definition assuming you land here with some SPL knowledge, a DM is essentially an elaborated Splunk query. If you are looking for a “TL DR” section, skip to “Myths Busted” below. Therefore, anyone else NOT using ES still have the option to use DMs or not.īut keep on reading, I will address this question further as I need more context around it. Just to start with and quickly address one of the elephants in the room: Splunk Enterprise Security (ES), also known as Splunk’s SIEM product heavily relies on DMs to function. Well, that’s still a very valid question. Should I keep using Data models – or not? Alright, if you’ve arrived here to know my opinion about beautiful people, sorry for the title, this is not for you, I am of course referring to Splunk data models (DM).Īs a Splunk content engineer, sometimes having to deal with and solve DM related issues before prototyping another detection, this is a topic that keeps coming back, so here’s my take on that.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |