Managing Query Execution in Database Engine Part – 2
Query Management
Before query management is performed, an individual will of course want to decide what things need to be improved first. The aim of accomplishing competence itself might be unlike in dissimilar circumstances. For an instance, an individual may need to reduce the execution time but then again in numerous circumstances the individual may desire to decrease the response time too. In further conditions, an individual might demand to reduce the input / output (I/O), memory consumption, network time or a number of blends among these for an example, the total resources consumed. Usually, a query processing algorithm say, “QPA1”, will be taken as more effectual than an algorithm say, “QPA2”, if the amount of cost is diminished for processing the similar query given the identical resources by means of “QPA1”, is normally less than that for “QPA2”.
To exemplify the need of optimization, an individual can take the current instance of a simple query which can be handled in more than a few diverse ways. The subsequent query recovers the patient names as well as the doctors name among the existing in – patient record those who are having a disease type for say “4”, which is coded as “Cancer”.
SELECT Patient . Name , Doctor . Name FROM Patient , Doctor WHERE Patient . Pat_ID = Doctor . Pat_ID AND Patient . Disease_Type = “ 4 ”
In the direction of executing the above mentioned query, one (1) join as well as two (2) constraints need to be achieved. There are several dissimilar methods in which these can be accomplished, few of them are mentioned below:
1. Join the relations or tables Patient and Doctor, the join outcome with Patient . Pat_ID and Doctor . Pat_ID then perform the constraints.
2. Join the relations or tables Patient and Doctor, then perform the constraints, then link the outcome with Patient . Disease_Type
3. Perform the constraint, join the relations or tables Patient and Doctor, then join the outcome with Patient . Disease_Type
4. Join the relations or table Patient and Patient . Disease_Type, then join the outcome with Doctor and then perform the constraints.
At this point, the current discussion is going about the cost approximations. But before we try to compare the costs of the abovementioned four (4) substitutes, it is needed to recognize that approximating the cost of a plan is frequently significant. In general a database is disk -oriented, frequently the cost of reading as well as writing to disk governs the cost of executing a query. An individual can hence approximate the cost of executing the query in terms of disk accesses otherwise block accesses. Approximating the number of block accesses to process when a simple query is not essentially straightforward as it would be determined by how the information is kept as well as which, if any present, indexes are obtainable. In a number of database systems, relations or tables are kept in packed format, which is, every single block has tuples or rows of the similar relation or tables however additional systems might store tuples from several relations in each block creating it much more exclusive for scanning the full relation or table.
Let’s compare the costs of the above mentioned four (4) choices. As precise cost calculations are problematic, here one can practice simple approximations of the cost. One can consider a circumstance where the current database is consisting of two thousand (2000) tuples or rows in the relation or table Patient, one hundred (100) in Doctor. For straightforwardness, let us accept that the relations or table Patient as well as Doctor have tuples or rows of alike size of around one hundred (100) bytes each, therefore an individual can fix twenty (20) tuples or rows per block if the block is presumed to be one (1) Kbytes in size. For the relation or table Patient, an individual can assume a tuple or row size of twenty (20) bytes as well as therefore one can use a number of twenty five (25) tuples or rows / block. An individual can at the present approximate the costs of above four (4) plans as listed above.
The cost of first (1st) query plan above can be calculated now. Let the join be calculated by means of reading a block of first (1st) relation or tables followed by a scanning of the second (2nd) relation or tables to classify the identical tuples or rows, this process is known as nested scanning as well as is not chiefly effectual. The problem with the effectiveness of algebraic operation is discussed in the later section of this article (at Part – 3 and Part – 4). After that, this is then followed by the reading of the second (2nd) block of the first (1st) relation or table followed by a scanning of the second (2nd) relation or table and so on. The cost of R | X | S can be consequently projected as the number of blocks as R times the number of blocks in S. As the number of blocks in Doctor is one hundred (100) and in Patient two thousand (2000), the sum total blocks read at the time of calculating the join of Doctor and Patient is 100 X 2000=200000 block. The outcome of the join is huge numbers of rows or tuples as every single row or tuple from Doctor matches with a row or tuple from Patient. The joined tuples or rows will be of size about one hundred and thirty (130) bytes as every single tuple or row in the join is a tuple or rows from Doctor joined with table or relation Patient.
In the direction of approximating the cost of second (2nd) query, an individual can does identify the cost of calculating the join of Patient as well as Doctor has been projected above as twenty thousand (200000) block accesses. At this instant the outcome of putting on the constraints to the outcome of the join decreases this outcome to about five (5) to ten (10) row or tuples, which means about one (1) to two (2) blocks. The cost of this constraint is about a bulk disk accesses. The consequence of implementing the restriction to the relation subject reduces that relation to twenty (20) tuples or rows (two (2) blocks). The cost of this constraint is about one hundred (100) block accesses also. The join operator at this moment necessitates about three (3) blocks accesses only.
In the direction of approximating the cost of third (3rd) query, an individual must approximate the size of the outcomes of constraints and its particular cost. The cost of the constraints in reading the relations or tables Patient as well as Doctor along with writing the outcomes. The reading costs are pretty high for accessing the blocks. But on the other hand the writing costs are very much minor as the size of the outcomes is one (1) tuple or rows for Patient. The cost of calculating the join of Patient as well as Doctor mainly includes the cost of reading the relations or tables. This is two thousand (2000) block accesses. The outcome is very lesser in size as well as for that reason the cost of writing the outcome back again is minor. Hence, the all total cost of third (3rd) query plan is “reading cost of Patient relation or table + reading cost of Doctor relation or table”.
Same type of approximations can be attained for executing the fourth (4th) query. But at this point of time the cost is not needed to be calculated, as the above mentioned approximations are enough to demonstrate that brute force technique of query executing is not likely to be effective. The cost can be considerably condensed when the query plan is improved. The concern of improvisation is unquestionably much more difficult than approximating the costs like it was shown in the above, as in the above mentioned approximation an individual did not think through the several substitute accessing paths which may be existing to the system for accessing every single relation or tables.
The above mentioned cost approximations expected that the accessing cost of the secondary storage controls the query handling costs. This is frequently a sensible hypothesis though the cost of communication is repeatedly very significant if an individual is handling with a dispersed system. The cost of storing can be significant in huge databases as a number of queries might need a big transitional outcome. The cost of Central Processing Unit (CPU) unquestionably is at all times significant as well as it is not unusual for database systems to be Central Processing Unit (CPU) bound than Input / Output (I/O) bound as is generally expected. In this article it has been undertaken as a centralized system where the cost of accessing the secondary storing is expected to govern additional costs even though it can be identify that this situation is not every time correct. For an instance, system “S1” practices –
COST = No. of PAGES FETCHED + CPU OPERATION
As soon as a query is itemized to a Database Management System (DBMS), it should select the superlative method to execute the query, with the given database information about the database. The perfection part of query handling normally is comprised of the subsequent processes.
1. An appropriate internal illustration
2. Rational alteration of the query
3. Access path and collection of the substitutes
4. Approximation of costs as well as choosing the best path.
The in details discussion has been provided in the subsequent part of this article series.
In the next part we will be discussing the Internal Query Arrangement process, Logical Renovations and Heuristic Method for Query Processing (Continued to Part – 4).