### 1. Problem

Given a string, such as 01001010101001101011, we can randomly sliced multiple substrings. Assume that during the slicing, due to some unexpected noises, some characters may flip(0->1 or 1->0). For example:

Position: 0123456789......... String: 01001010101001101011 slice1: 1001110101000 slice2: 1010111001111 slice3: 10101 slice4: 1101011

In above sample, `slice1`

starts from position 1(assume that the index of a string begins with 0), `slice2`

starts from position 4, `slice3`

starts from 4, and `slice4`

starts from 13. In slice1, `0`

flips to `1`

at position 5 and `1`

flips to `0`

at position 13.

For one specific position in the original string, if it is `1`

, then the probability of flipping to 0 in a slicing is 0.1; and vice versa(i.e.Prob(0->1)=0.1).

The problem is: **if we only have multiple slices(the length of each slice may vary) and their starting positions in the string, and we don’t know the original string, given an arbitrary position in the original string, how can we calculate the probability that position is a 1?**

Assume most positions will be covered at least once in slices, and we have following parameters:

```
p01=0.1; // Probability a ‘0’ in string but flipped to a ‘1’ in a slice
p10=0.1; // Probability a ‘1’ in string but flipped to a ‘0’ in a slice
p1=0.5; // Prior probability that any given position in string is a ‘1’
```

We can also assume that the string is a random string of `0`

s and `1`

s, and during slicing, each position is sampled independently.

For the above example string and four slices, we already have following probabilities for each position:

```
Pos Prob
0 0.500
1 0.900
2 0.100
3 0.100
4 0.999
5 0.100
6 0.999
7 0.001
8 0.999
9 0.500
10 0.988
11 0.012
12 0.012
13 0.900
14 0.988
15 0.500
16 0.988
17 0.100
18 0.900
19 0.900
```

### 2. Solution

Obviously, the numbers of 0s and 1s in all slices for each position can be counted using a hashmap(`unordered_map`

in C++). However, the difficult part is to find a model to calculate the probabilities. Simple multiplicaiton or subtraction won’t work, because we cannot get `0.999`

with the given probabilities and arithmetic operations. Besides, the Bayes’ theorem also cannot be used directly, because we don’t know the number of slices and the computation would be too complicated. And we also may be tempted to try Markov Model, but Markov Model cannot fit all positions in the above example, such as positions 4(`1,1,1`

), 5(`1,0,0`

), 9(`0,1`

) and 13(`0,1,1`

).

The best model to fit this problem is the Binomial Distribution:

The binomial distribution models the total number of successes in repeated trials from an infinite population under the following conditions:

- Only two outcomes are possible on each of n trials.
- The probability of success for each trial is constant.
- All trials are independent of each other.

The probability density function(pdf) of Binomial Distribution is:

As for this problem, given a position in the original string, we first need to find the probability of a `0`

occurs in this position(in this case `0`

=>`1`

, say `bp01`

), and then calculate the probability of a `1`

occurs in this position(in this case `1`

=>`1`

, say `bp11`

). Finally, we can get the probability of `1`

in this position in the original string by: `p=bp11/(bp01+bp11)`

.

Take position 5 as an example. We get one `1`

and two `0`

s in position 5 in three observations. We can calculate `b11`

using Matlab `binopdf`

function. Since `p11=1-p10=0.9`

, therefore we have `bp11=binopdf(1,3,0.9)=0.027`

. As for `bp01`

, because `p01=0.1`

, we have `bp01=binopdf(1,3,0.1)=0.243`

. Therefore, we can calc the probability of `1`

in position 5 by `p=0.027/(0.243 + 0.027)=0.1`

.

### 3. Source Code

The implementation of this problem can be found in my GitHub Channel.