# Horizontal Fragmentation

In horizontal fragmentation, a relation (table) is divided into multiple sub-tables/sub-sets horizontally and by using the simple conditions.

For example, we have a relation (table) as Student(RollNo, Marks, University). Suppose the values for University attribute are ‘Stanford’, ‘California’, and ‘Harvard’, then the following SQL query can fragment the records (tuples) satisfying a simple condition.

SELECT * FROM Student WHERE University = ‘California’;

This query displays all the records related to the ‘California’ University, without any changes in the schema of the table. We can get three records if we change the University value in the WHERE clause of the above query, one for ‘California’, one for ‘Stanford’, and one for ‘Harvard’.

So this kind of horizontally fragmenting the whole table into multiple sub-tables without changing the table structure is called Horizontal Fragmentation. The concept is usually used to keep tuples (records) at the places where they are used the most, to minimize data transfer between far locations.

## Types of Horizontal Fragmentation

- Primary Horizontal Fragmentation (PHF)
- Derived Horizontal Fragmentation (DHF)

# Primary Horizontal Fragmentation (PHF)

Primary Horizontal Fragmentation is a table fragmentation technique in which we fragment a single table and this fragmentation is row-wise and using a set of simple conditions.

**Note:** Conditions are also called predicates.

Simple predicate

Given a table/relation R with set of attributes [A_{1}, A_{2}, A3, A4, …, A_{n}], a simple predicate P_{i} can be expressed as follows;

P_{i} : A_{j} θ Value

Where θ can be any of the symbols in the set {≤, ≥, ≠, <, >, =}, a value can be any value stored in the table for the attributed A _{i}. For example, consider the following table Student given in Figure 1;

RollNo | Marks | University |

T01 | 33 | Harvard |

T03 | 77 | Stanford |

T04 | 23 | California |

T02 | 89 | California |

T05 | 90 | Harvard |

T06 | 90 | Harvard |

T07 | 15 | Stanford |

Figure 1: Student table

For the above table, we could define any simple predicates like University = ‘California’, University= ‘Harvard’, Marks < 77 etc using the above expression “Aj θ Value”.

## Set of simple predicates

Set of simple predicates is set of all conditions collectively required to fragment a relation into subsets. For a table R, set of simple predicate can be defined as;

Predicate P = { P_{1}, P_{2}, …, P_{n}}

Example 1

As an example, for the above table Student, if simple conditions are, Marks < 77, Marks ≥ 77, then,

Set of simple predicates P1 = {Marks < 77, Marks ≥ 77}

Example 2

As another example, if simple conditions are, University = ‘California’, University= ‘Harvard’, Marks < 77, Marks ≥ 77, then,

Set of simple predicates P2 = { University = ‘California’, University= ‘Harvard’, Marks < 77, Marks ≥ 77}

## Min-term Predicate

When we fragment any relation horizontally, we use single condition or set of simple predicates to filter the data. Given a relation R and set of simple predicates, we can fragment a relation horizontally as follows;

Fragment, R_{i} = σ_{Fi}(R), 1 ≤ i ≤ n

where F_{i} is the set of simple predicates, also called as a Min-term predicate which can be written as follows;

Min-term predicate, M_{i}=P_{1} Λ P_{2} Λ P_{3} Λ … Λ P_{n}

Here, P_{1} means both P_{1} or ¬(P_{1}), P_{2} means both P_{2} or ¬(P_{2}), P_{3} means both P_{3} or ¬(P_{3}), and so on. Using the conjunctive form of various simple predicates in different combination, we can derive many such min-term predicates.

For the example 1 stated previously, we can derive set of min-term predicates using the rules stated above as follows;

We will get 2^{n} min-term predicates, where n is the number of simple predicates in the given predicate set. For P1, we have 2 simple predicates. Hence, we will get 4 (2^{2}) possible combinations of min-term predicates as follows;

m_{1} = {Marks < 77 Λ Marks ≥ 77}

m_{2} = {Marks < 77 Λ ¬(Marks ≥ 77)}

m_{3} = {¬(Marks < 77) Λ Marks ≥ 77}

m_{4} = {¬(Marks < 77) Λ ¬(Marks ≥ 77)}

Our next step is to choose the min-term predicates which can satisfy certain conditions to fragment a table and eliminate the others which are not useful. For example, the above set of min-term predicates can be applied each as a formula Fi stated in the above rule for fragment Ri as follows;

^{Student}1 ^{= }^{σ}Marks< 77 Λ Marks ≥ 77^{(Student)}

which can be written in equivalent SQL query as,

Student_{1}

SELECT * FROM Student WHERE Marks < 77 AND Marks ≥ 77;

^{Student}2 ^{= }^{σ}Marks< 77 Λ ¬(Marks ≥ 77)^{(Student)}

which can be written in equivalent SQL query as,

Student_{2}

SELECT * FROM Student WHERE Marks < 77 AND NOT Marks ≥ 77; where NOT Marks ≥ 77 is equivalent to Marks < 77.

^{Student}3 ^{= σ}¬(Marks< 77) Λ Marks ≥ 77^{(Student)}

which can be written in equivalent SQL query as,

Student_{3}

SELECT * FROM Student WHERE NOT Marks < 77 AND Marks ≥ 77; where NOT Marks < 77 is equivalent to Marks ≥ 77.

^{Student}4 ^{= σ}¬(Marks< 77) Λ ¬(Marks ≥ 77)^{(Student)}

which can be written in equivalent SQL query as,

Student_{4}

SELECT * FROM Student WHERE NOT Marks < 77 AND NOT Marks ≥ 77;

where NOT Marks < 77 is equivalent to Marks ≥ 77 and NOT Marks ≥ 77 is equivalent to Marks < 77. This is exactly same as the query for fragment Student_{1}.

From these examples, it is very clear that the first query for fragment Student_{1} (min-term predicate m_{1}) is invalid as any record in a table cannot have two values for any attribute in one record. That is, the condition (Marks < 77 Λ Marks ≥ 77) requires that the value for Marks must both be less than 77 and greater and equal to 77, which is not possible. Hence the condition violates and can be eliminated. For fragment Student_{2} (min-term predicate m_{2}), the condition is (Marks<77 and Marks<77) which ultimately means Marks<77 which is correct. Likewise, fragment Student_{3} is valid and Student_{4} must be eliminated. Finally, we use the min-term predicates m2 and m3 to fragment the Student relation. The fragments can be derived as follows for Student;

SELECT * FROM Student WHERE Marks < 77;

Student_{2}

RollNo | Marks | University |

T01 | 33 | Harvard |

T04 | 23 | California |

T07 | 15 | Stanford |

T05 | 90 | Harvard |

T06 | 90 | Harvard |

SELECT * FROM Student WHERE Marks ≥ 77;

Student_{3}

RollNo | Marks | University |

T03 | 77 | Stanford |

T02 | 89 | California |

## Correctness of Fragmentation

We horizontally fragment the table with the help of min term predicates. Now, we need to validate all the chosen fragments for their correctness. We need to verify did we miss anything? We use the following rules to ensure that we have not changed semantic information about the table which we fragment.

## Completeness

If a relation R is fragmented into a set of fragments, then a record of Relation R must be found in any one or more of the fragments. This rule ensures that we have not lost any records during the process of fragmentation.

## Reconstruction

After fragmenting a table, we must be able to reconstruct it back to its original form without any data loss through some relational operation. Reconstruction can be performed with the help of union operation.This rule ensures that we can construct a base table back from its fragments without losing any information. That is, we can write any queries involving the join of fragments to get the original relation back. For example;

(SELECT * FROM Student2) Union (SELECT * FROM Student3);

Disjointness – If a relation R is fragmented into a set of sub-tables R_{1}, R_{2}, R3, …, R_{n}, a record belongs to R_{1} is not found in any other fragment. This ensures that R_{1} ≠ R_{2}.

For example, consider the Student table in Figure 1 and its fragments Student_{2}, and Student_{3} created using the min-term predicates we derived.

From the tables Student_{2}, and Student_{3} it is clear that the fragmentation is Complete. That is, we have not missed any records.

Just all are included into one of the sub-tables.

When we use an operation, say Union between Student_{2}, and Student_{3} we will be able to get the original relation Student.

(SELECT * FROM Student2) Union (SELECT * FROM Student3);

The above query will get us Student back without loss of any information. Hence, the fragments created can be reconstructed.

Finally, if we write a query as follows, we will get a Null set as output. It ensures that the Disjointness property is satisfied.

(SELECT * FROM Student2) Intersect (SELECT * FROM Student3);

We get a null set as result for this query because there is no record common in both relations Student_{2} and Student_{3}.

For the example 2, recall the set of simple predicates which was as follows;

Set of simple predicates P2 = { University = ‘California’, University= ‘Harvard’, Marks < 77, Marks ≥ 77}

We can derive the following min-term predicates;

m_{1} = { University = ‘California’ Λ University= ‘Harvard’ Λ Marks < 77 Λ Marks ≥ 77} m_{2} = { University = ‘California’ Λ University= ‘Harvard’ Λ Marks < 77 Λ ¬(Marks ≥ 77)} m_{3} = { University = ‘California’ Λ University= ‘Harvard’ Λ ¬(Marks < 77) Λ Marks ≥ 77} m_{4} = { University = ‘California’ Λ ¬(University= ‘Harvard’) Λ Marks < 77 Λ Marks ≥ 77}

m_{n} = { ¬(University = ‘California’) Λ ¬(University= ‘Harvard’) Λ ¬(Marks < 77) Λ ¬(Marks ≥ 77)}

As in the previous example, out of 16 (2^{4}) min-term predicates, the set of min-term predicates which are not valid should be eliminated. At last, we would have the following set of valid min-term predicates.

m_{1} = { University = ‘California’ Λ ¬(University= ‘Harvard’) Λ ¬(Marks < 77) Λ Marks ≥ 77} m_{2} = { University = ‘California’ Λ ¬(University= ‘Harvard’) Λ Marks < 77 Λ ¬(Marks ≥ 77)} m_{3} = { ¬(University = ‘California’) Λ University= ‘Harvard’ Λ ¬(Marks < 77) Λ Marks ≥ 77} m_{4} = { ¬(University = ‘California’) Λ University= ‘Harvard’ Λ Marks < 77 Λ ¬(Marks ≥ 77)}

m_{5} = { ¬(University = ‘California’) Λ ¬(University= ‘Harvard’) Λ ¬(Marks < 77) Λ Marks ≥ 77}

m_{6} = { ¬(University = ‘California’) Λ ¬(University= ‘Harvard’) Λ Marks < 77 Λ ¬(Marks ≥ 77)}

The horizontal fragments using the above set of min-term predicates can be generated as follows;

Fragment 1: SELECT * FROM Student WHERE University = ‘California’ AND Marks ≥ 77;

Fragment 2: SELECT * FROM Student WHERE University = ‘California’ AND Marks < 77;

Fragment 3: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks ≥ 77;

Fragment 4: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks < 77;

The horizontal fragments using the above set of min-term predicates can be generated as follows;

Fragment 1: SELECT * FROM Student WHERE University = ‘California’ AND Marks ≥ 77;

Student_{1}

RollNo | Marks | University |

T02 | 89 | California |

Fragment 2: SELECT * FROM Student WHERE University = ‘California’ AND Marks < 77;

Student_{2}

RollNo | Marks | University |

T02 | 23 | California |

Fragment 3: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks ≥ 77;

Student_{3}

RollNo | Marks | University |

T05 | 90 | Harvard |

T06 | 90 | Harvard |

Fragment 4: SELECT * FROM Student WHERE University = ‘Harvard’ AND Marks < 77;

Student_{4}

RollNo | Marks | University |

T01 | 33 | Harvard |

In the STUDENT table, we have the third University ‘Stanford’, which was not specified in the set of simple predicates. Hence, in the fragmentation process, we must not leave the tuple with the value ‘Stanford’. That is the reason we have included the min-term predicates m_{5} and m_{6} which can be derived as follows;

Fragment 5: SELECT * FROM Student WHERE University <> ‘Harvard’ AND University <> ‘California’ AND Marks ≥ 77;

Student_{5}

RollNo | Marks | University |

T03 | 77 | Stanford |

Fragment 6: SELECT * FROM Student WHERE University <> ‘Harvard’ AND University <> ‘California’ AND Marks < 77;

Student_{6}

RollNo | Marks | University |

T07 | 15 | Stanford |

## The correctness of fragmentation:

### Completeness

The tuple of the table Student is distributed into different fragments. No records were omitted. Otherwise, by performing the union operation between all the Student table fragments Student_{1}, Student_{2}, Student_{3}, and Student_{4}, we will be able to get Student back without any information loss. Hence, the above fragmentation is Complete.

Reconstruction

As said before, by performing Union operation between all the fragments, we will be able to get the original table back. Hence, the fragmentation is correct and the reconstruction property is satisfied.

### Disjointness

When we perform Intersect operation between all the above fragments, we will get null set as result, as we do not have any records in common for all the fragments. So we can say that disjointness property is satisfied.