Cisco Application Velocity System User Guide (Software Version 5.0)
Anonymous Base File Statistical Model

Table Of Contents

Anonymous Base File Statistical Model

Statistical Model for Anonymous Base File Technology

Case 1: p=1%

Case 2: p=5%


Anonymous Base File Statistical Model


This appendix presents a statistical model that quantifies the probability of any confidential data common to a set of users (and context for that data to be associated with a user) being present within an anonymous base file.

Consider two variables m and n, where m represents the "base file anonymity level" and n represents the "base file sample size." This technology creates a single base file shared by all users that contains only content common to m out of n user-specific base files (and thus, users). For example, if m=5 and n=15, this means that the anonymous base file will contain only content common to at least 5 of 15 base files. Any content unique to any of the 15 user-specific base files is excluded from the anonymous base file. Content common to less than 5 of the 15 user-specific base files is excluded.

The user-specific base files in the "base file sample size" are selected as the first n unique requests (that is, with unique application appliance cookie IDs) for a given URL or set of URLs, depending on application appliance configuration. Note that the n base files within the sample size are of a Per-User type created solely to enable this feature. These Per-User base files are not used to condense content themselves. Only the resulting anonymous base file is used to condense content.

The BaseFileAnonLevel configuration parameter enables the administrator to select a value of m. The application appliance automatically sets the base file sample size n to the greater of 3m or 5 (n=max (3m, 5)) to ensure an extremely low probability of creating a shared anonymous base file that contains confidential information common to the user-specific base files within the base file sample size. Through extensive testing, Cisco recommends an anonymity level of m=2 for those who use this feature. Note that the anonymous base files feature is an All-User Condensation option and must be explicitly configured to be enabled.

Statistical Model for Anonymous Base File Technology

In this section we present a statistical model that quantifies the probability of any confidential data common to a set of users (and context for that data to be associated with a user) being present within an anonymous base file.

Recall our previous definitions of m, (the "base file anonymity level") and n (the "base file sample size"). Given m and n, the only way for the anonymous base file to include some user-specific context is for at least m out of n selected base files within the base file sample size to contain common confidential user-specific information such as a home address, credit card number, etc. As an example of how this might occur, consider the following scenario for m=2.

Suppose different users on different machines use the same credit card online. This could occur when a corporate or shared family credit card is used for an online transaction. In this case, if these transactions occur in rapid succession such that their associated user-specific base files are selected as part of the base file sample size within the time it takes to create the n base files in the sample size (typically a matter of seconds), it is possible, though with a low probability, that confidential information common to both base files could be included in the anonymous base file.

Our statistical model assumes that the probability of such an event occurring is low and derives the probability that such an anonymous base file would be generated as a function of this event probability. In this model we assume that p is a function of m. Specifically, here we assume that p decreases exponentially as m increases.

Intuitively, the probability of this scenario previously described occurring for values of m>2 should decrease significantly as m increases. That is, it is much less probable that 3, 4, or more corporate cardholders would use the same credit card number during the short period in which the user-specific base file sample size is chosen than for n such cardholders.

We can state this formally with conditional probabilities as follows:

p(cardholderi|cardholderi+1) << p(cardholderi)

This model states that the error probabilities decrease exponentially as follows:

pm = pm (that is, p1= p, p2=p2, etc.)

In this model, it can be shown that the probability Perror of creating an anonymous base file that contains common confidential information in at least m of n user-specific base files is given by the following expression:

where e=2.71828..., the natural logarithmic base (Euler's constant).

Case 1: p=1%

Here we assume that the probability of m cardholders using the same credit card number to execute an online transaction within the time required to generate n user-specific base files (on the order of a few seconds) is (.01)m.

Assuming p=1% so that pm=(.01)m, various values of m, and n=max(3m,5), refer to Table E-1 for values of Perror.

Table E-1 Case 1, p=1%

m
n
Perror (%)
Ratio

2

6

6.6501E-11

1 in 1.5037E+10

3

9

5.4231E-22

1 in 1.8440E+21

4

12

4.4224E-37

1 in 2.2612E+36

5

15

3.6064E-56

1 in 2.7728E+55

6

18

2.9410E-79

1 in 3.4002E+78

7

21

2.3983E-106

1 in 4.1696E+105


In this model, using the recommended configuration value of m=2, and thus n=6, Table E-1 shows that the probability Perror of creating such an anonymous base file is about 1 in 1.5x 1010 (that is, 1 in 15 billion).

Case 2: p=5%

Here we assume that the probability of m cardholders using the same credit card number to execute an online transaction within the time required to generate n user-specific base files (on the order of a few seconds) is (.05)m.

Assuming p=5% so that pm=(.05)m, various values of m, and n=max(3m,5), refer to Table E-2 for values of Perror.

Table E-2 Case 2, p=5%

m
n
Perror (%)
Ratio

2

6

1.0391E-06

1 in 9.6239E+05

3

9

1.3240E-13

1 in 7.5529E+12

4

12

4.2176E-23

1 in 2.3710E+22

5

15

3.3587E-35

1 in 2.9773E+34

6

18

6.6870E-50

1 in 1.4954E+49

7

21

3.3283E-67

1 in 3.0045E+66


Here, using the recommended configuration value of m=2, and thus n=6, Table E-2 shows that the probability Perror of creating such an anonymous base file is 1 in 9.6 x 105 (about 1 in 1 million).

This model clearly shows that the probability of generating an anonymous base file that contains common confidential information and thus user-specific context is extremely low (almost zero). As a result, this feature is a highly effective mechanism for enabling condensation of personalized and/or confidential content. When this feature is used in conjunction with SSL, the application appliance enables condensed content confidentiality as well as condensed content security via SSL encryption.