Thoughts, Insights and Inspiration

A Beginner’s Guide to Deterministic and Probabilistic Tracking

We recently looked at the growing need for cross-device tracking and attribution and its potential use by digital marketers wishing to understand consumer behaviour on one of our articles. However, the approach for employing these models is not universal as there are two distinct models for gathering user data: deterministic and probabilistic tracking.

Although both deterministic and probabilistic tracking models give information on consumer behaviour, their means of doing so is completely different. Deterministic tracking draws actual user consumer data to create a reliable representation of user touch points and identity, whereas probabilistic tracking analyses billions of touch points to ascertain the probability that two separate visits were made by the same user.

Both methods produce a model of consumer behaviour which can help inform how products and advertisements are sold, and the differences affect how these are used as well as who uses them.

Functional Differences

Deterministic tracking relies on users logging in to a website which allows their touch points to be tracked. An anonymised tracking ID is then assigned to the user and all subsequent consumer data is matched with this ID and is used for strategies such as product suggestions. This data provides marketers and analysts with a collection of user profiles with information on the user which allows for in-depth segmentation. This will include information such as historical purchases and browsing data. The type of cross-device consumer data depends on the products offered by the website and what questions it asks its users.

Conversely, probabilistic tracking uses algorithms to collate huge amounts of data to make assumptions about user behaviour on site. Probabilistic tracking does this by analysing billions of touch points from a number of sources including IP addresses and browsing patterns. Probabilistic tracking uses this data to attribute behaviours to a single device. This type of tracking can offer accuracy rates higher than 90% and therefore offers invaluable insights into user behaviour without requiring users to be logged in to a site.

The data that each tracking model produces will be very different as a result of their functional variances. Deterministic data can be taken at face value, although critics have warned against the walled garden effect, wherein deterministic models fail to account for user behaviour prior to logging in. A process known as session stitching looks to eliminate this issue by recording the pre-login touch points and attributes them with the user ID after login. However, this still relies on a user authenticating a device and that the same device isn’t accessed by multiple different users as this would skew the data.

Probabilistic data provides a prediction of behaviour which can be subject to change. What or why this change might happen is near impossible to say in any given circumstance which is why marketers need to analyse this data carefully. For programmatic advertising the accuracy is comparable with deterministic data as confidence intervals are built into segmentation, meaning that the deficiency in accuracy doesn’t count as much in a process with an inherent amount of approximation.


Deterministic tracking is better suited for websites that can guarantee that a large proportion of its users will be logged in when they are using it. This is because without the login, deterministic tracking does not have the capability to accrue data and can only do so when the website is accessed in this way. For many websites this is not an issue as the very nature of the site requires users to be logged-in in order to access its main functions.

The advantage of deterministic tracking is that its data is based on actual consumer behaviour. Therefore marketers can be confident that the output will reveal genuine information on how users behave on site. In an era where marketers are more concerned with metrics, the factual stability of deterministic tracking means this model will often win the vote.

Probabilistic tracking is the wiser choice for sites with fewer login requirements and smaller consumer bases. As probabilistic tracking models are based on scaling probability rather than a record of actual consumer data, a smaller consumer base will not lead to misrepresentative reporting. A website using a large sample size can use deterministic tracking with greater confidence as any irregularities will be ironed out over time. For a website with less traffic, such a report could provide false answers about long-term consumer behaviour.

The fact that probabilistic does not require users to be logged in will also be advantageous for some sites. Many companies – in particular startups with low consumer awareness – will not opt for login requirements as this can lead to a high bounce rate. Therefore probabilistic tracking models are better suited to gain an understanding of how users are behaving on site.

A major issue in the implementation of either technology is privacy. The probabilistic model faces an ethical conundrum as it gathers massive amounts of consumer data without a transparent opt-out service. With this in mind, an advert which follows a user across devices against their wishes may put them off the brand for good. Deterministic tracking faces criticism for similar reasons but has fewer questions to answer as the account creation requirements will usually cover any opt-out criteria.


The choice between deterministic and probabilistic cross-device tracking may come down to the practical issue of user logins. Individual preferences could have a say, as those with a penchant for hard data will probably opt for a deterministic representation of consumer behaviour.

In terms of their appeal to marketers, deterministic tracking models need to convince retailers that the consumer data offers a comprehensive view of user behaviour as marketers may be suspicious of its limited nature. Although probabilistic tracking models can give a more well-rounded view of how a user behaves online, they also need to produce consistently accurate results in order to justify the incredible processing power needed to conduct the necessary analysis which inevitably comes at a great cost.

The ideal solution for marketers in the future would be a mixed model that is able to cover all the bases and provide a complete picture of consumer behaviour.

Which of these models would your business benefit from? How could it improve your business? Get in touch with us on social media with your thoughts.

From our Blog