Hi,

I'm working on a very specific problem involving patents. Following is a toy exemple of the problem I encounter:

I want to find the cosine similarity between the patent profile of two entities (called E1 and E2) . Below are the profiles of the entities

Here is the vector for E1:

Class A : 2 patents

Class B : 3 patents

Calss C : 5 patents

Here is the verctor for E2:

Class A : 4 patents

Class B : 6 patents

Calss C : 1 patents

If I want to find the cosinus similarity between the two entities I just compute it using the two vectors (2,3,5) and (4,6,1) and the cosine similarity formula to find C1 (cosine similarity at first level)

Now here is the curved ball, each class can be further divided into subclasses 1,2,3. The profiles of the entities then become :

E1 :

A.1: 1

A.2: 0

A.3: 1

B.1: 2

B.2: 1

B.3: 0

C.1: 4

C.2: 1

C.3: 0

And E2:

A.1: 2

A.2: 1

A.3: 1

B.1: 2

B.2: 1

B.3: 3

C.1: 0

C.2: 1

C.3: 0

So again, using the following vectors (1,0,1,2,1,0,4,1,0) and (2,1,1,2,1,3,0,1,0), I can compute another cosine similarity at this second level, C2 (cosine similarity at second level)

If I remove C1 from C2 I get a residual. My question is what is the formula to find this residual. From my perspective C2 is the "proximity" between the two profiles, C1 is the "unrelated proximity", and C3 would represent the "related proximity"

Any help is welcome be it a solution or pointing me toward the right resources.

Best regards,

A.