Database Management System,Advantages and disadvantages of DBMS,Types of DBMS,File processing system ,disadvantages,Database languages,Entity relationship diagram.
Database management system:
A database management system is the software system that allows users to define, create and maintain a database and provides controlled access to the data. A Database Management System (DBMS) is basically a collection of programs that enables users to store, modify, and extract information from a database as per the requirements. DBMS is an intermediate layer between programs and the data. Programs access the DBMS, which then accesses the data. There are different types of DBMS ranging from small systems that run on personal computers to huge systems that run on mainframes. The following are main examples of database applications: • Computerized library systems • Automated teller machines • Flight reservation systems • Computerized parts inventory systems A database management system is a piece of software that provides services for accessing a database, while maintaining all the required features of the data. Commercially available Database management systems in the market are dbase, FoxPro, IMS and Oracle, MySQL, SQL Servers and DB2 etc. These systems allow users to create update, and extract information from their databases.Compared to a manual filing system, the biggest advantages to a computerized database system are speed, accuracy, and’ accessibility.
Advantages of DBMS:
a)Controls Redundancy: In file system, each application has its own private files, which cannot be shared between multiple applications. 1:his can often lead to considerable redundancy in the stored data, which results in wastage of storage space. By having centralized database most of this can be avoided. It is not possible that all redundancy should be eliminated. Sometimes there are sound business and technical reasons for maintaining multiple copies of the same data. In a database system, however this redundancy can be controlled.
b) Integrity can be enforced: Integrity of data means that data in database is always accurate, such that incorrect information cannot be stored in database. In order to maintain the integrity of data, some integrity constraints are enforced on the database. A DBMS should provide capabilities for defining and enforcing the constraints.
c) Inconsistency can be avoided : When the same data is duplicated and changes are made at one site, which is not propagated to the other site, it gives rise to inconsistency and the two entries regarding the same data will not agree. At such times the data is said to be inconsistent. So, if the redundancy is removed chances of having inconsistent data is also removed.
d) Data can be shared: As explained earlier, the data about Name, Class, Father __name etc. of General_Office is shared by multiple applications in centralized DBMS as compared to file system so now applications can be developed to operate against the same stored data. The applications may be developed without having to create any new stored files.
e) Standards can be enforced : Since DBMS is a central system, so standard can be enforced easily may be at Company level, Department level, National level or International level. The standardized data is very helpful during migration or interchanging of data. The file system is an independent system so standard cannot be easily enforced on multiple independent applications.
f) Restricts unauthorized access: When multiple users share a database, it is likely that some users will not be authorized to access all information in the database. For example, account office data is often considered confidential, and hence only authorized persons are allowed to access such data. In addition, some users may be permitted only to retrieve data, whereas other are allowed both to retrieve and to update. Hence, the type of access operation retrieval or update must also be controlled. Typically, users or user groups are given account numbers protected by passwords, which they can use to gain access to the database. A DBMS should provide a security and authorization subsystem, which the DBA uses to create accounts and to specify account restrictions. The DBMS should then enforce these restrictions automatically.
g)Solves Enterprise Requirement than Individual Requirement: Since many types of users with varying level of technical knowledge use a database, a DBMS should provide a variety of user interface. The overall requirements of the enterprise are more important than the individual user requirements. So, the DBA can structure the database system to provide an overall service that is “best for the enterprise”.
h) Provides Backup and Recovery: A DBMS must provide facilities for recovering from hardware or software failures. The backup and recovery subsystem of the DBMS is responsible for recovery. For example, if the computer system fails in the middle of a complex update program, the recovery subsystem is responsible for making sure that the .database is restored to the state it was in before the program started executing.
i)Cost of developing and maintaining system is lower: It is much easier to respond to unanticipated requests when data is centralized in a database than when it is stored in a conventional file system. Although the initial cost of setting up of a database can be large, but the cost of developing and maintaining application programs to be far lower than for similar service using conventional systems. The productivity of programmers can be higher in using non-procedural languages that have been developed with DBMS than using procedural languages.
j) Data Model can be developed : The centralized system is able to represent the complex data and interfile relationships, which results better data modeling properties. The data madding properties of relational model is based on Entity and their Relationship, which is discussed in detail in chapter 4 of the book.
k)Concurrency Control : DBMS systems provide mechanisms to provide concurrent access of data to multiple users.
Disadvantages of DBMS
a) Complexity : The provision of the functionality that is expected of a good DBMS makes the DBMS an extremely complex piece of software. Database designers, developers, database administrators and end-users must understand this functionality to take full advantage of it. Failure to understand the system can lead to bad design decisions, which can have serious consequences for an organization.
b) Size : The complexity and breadth of functionality makes the DBMS an extremely large piece of software, occupying many megabytes of disk space and requiring substantial amounts of memory to run efficiently.
c) Performance: Typically, a File Based system is written for a specific application, such as invoicing. As result, performance is generally very good. However, the DBMS is written to be more general, to cater for many applications rather than just one. The effect is that some applications may not run as fast as they used to.
d)Higher impact of a failure: The centralization of resources increases the vulnerability of the system. Since all users and applications rely on the ~vailabi1ity of the DBMS, the failure of any component can bring operations to a halt.
e) Cost of DBMS: The cost of DBMS varies significantly, depending on the environment and functionality provided. There is also the recurrent annual maintenance cost.
f) Additional Hardware costs: The disk storage requirements for the DBMS and the database may necessitate the purchase of additional storage space. Furthermore, to achieve the required performance it may be necessary to purchase a larger machine, perhaps even a machine dedicated to running the DBMS. The procurement of additional hardware results in further expenditure.
g) Cost of Conversion: In some situations, the cost oftlle DBMS and extra hardware may be insignificant compared with the cost of converting existing applications to run on the new DBMS and hardware. This cost also includes the cost of training staff to use these new systems and possibly the employment of specialist staff to help with conversion and running of the system. This cost is one of the main reasons why some organizations feel tied to their current systems and cannot switch to modern database technology.
Type of Database System:
On the basis of the number of users:
• Single-user DBMS
• Multi-user DBMS
On the basis of the site location
• Centralized DBMS
• Parallel DBMS
• Distributed DBMS
• Client/server DBMS
The database system may be multi-user or single-user. The configuration of the hardware and the size of the organization will determine whether it is a multi-user system or a single user system. In single user system the database resides on one computer and is only accessed by one user at a time. This one user may design, maintain, and write database programs. Due to large amount of data management most systems are multi-user. In this situation the data are both integrated and shared. A database is integrated when the same information is not recorded in two places. For example, both the Library department and the Account department of the college database may need student addresses. Even though both departments may access different portions of the database, the students’ addresses should only reside in one place. It is the job of the DBA to make sure that the DBMS makes the correct addresses available from one central storage area.
On the basis of site location
a)Centralized Database System:
The centralized database system consists of a single processor together with its associated data storage devices and other peripherals. It is physically confined to a single location. Data can be accessed from the multiple sites with the use of a computer network while the database is maintained at the central site.
Disadvantages of Centralized Database System
• When the central site computer or database system goes down, then every one (users) is blocked from using the system until the system comes back. • Communication costs from the terminals to the central site can be expensive.
b)Parallel Database System:
Parallel database system architecture consists of a multiple Central Processing Units (CPUs) and data storage disk in parallel. Hence, they improve processing and Input/Output (I/O) speeds. Parallel database systems are used in the application that have to query extremely large databases or that have to process an extremely large number of transactions per second.
Advantages of a Parallel Database System
• Parallel database systems are very useful for the applications that have to query extremely large databases (of the order of terabytes, for example, 1012 bytes) or that have to process an extremely large number of transactions per second (of the order of thousands of transactions per second). • In a parallel database system, the throughput (that is, the number of tasks that can be completed in a given time interval) and the response time (that is, the amount of time it takes to complete a single task from the time it is· submitted) are very high.
Disadvantages of a Parallel Database System
• In a parallel database system, there· is a startup cost associated with initiating a single process and the startup-time may overshadow the actual processing time, affecting speedup adversely. • Since process executing in a parallel system often access shared resources, a slowdown may result from interference of each new process as it completes with existing processes for commonly held resources, such as shared data storage disks, system bus and so on.
c)Distributed Database System:
A logically interrelated collection of shared data physically distributed over a computer network is called as distributed database and the software system that permits the management of the distributed database and makes the distribution transparent to users is called as Distributed DBMS. It consists of a single logical database that is split into a number of fragments. Each fragment is stored on one or more computers under the control of a separate DBMS, with the computers connected by a communications network. As shown, in distributed database system, data is spread across a variety of different databases. These are managed by a variety of different DBMS software running on a variety of different operating systems. These machines are spread (or distributed) geographically and connected together by a variety of communication networks. \
Advantages of Distributed Database System
• Distributed database architecture provides greater efficiency and better performance. • A single database (on server) can be shared across several distinct client (application) systems. • As data volumes and transaction rates increase, users can grow the system incrementally. • It causes less impact on ongoing operations when adding new locations. • Distributed database system provides local autonomy. Disadvantages of Distributed Database System • Recovery from failure is more complex in distributed database systems than in centralized systems.
Client/Server architecture of database system has two logical components namely client, and server. Clients are generally personal computers or workstations whereas server is large workstations, mini range computer system or a mainframe computer system. The applications and tools of DBMS run on one or more client platforms, while the DBMS soft wares reside on the server. The server computer is caned backend and the client’s computer is called front end. These server and client computers are connected into a network. The applications and tools act as clients of the DBMS, making requests for its services. The DBMS, in turn, processes these requests and returns the results to the client(s). Client/Server architecture handles the Graphical User Interface (GUI) and does computations and other programming of interest to the end user. The server handles parts of the job that are common to many clients, for example, database access and updates.
Multi-Tier client server computing models
In a single-tier system the database is centralized, which means the DBMS Software and the data reside in one location and the dumb terminals were used to access the DBMS as shown. The rise of personal computers in businesses during the 1980s, the increased reliability of networking hardware causes Two-tier and Three-tier systems became common. In a two-tier system, different software is required for the server and for the client. Illustrates the two-tier client server model. At the early stages client server computing model was called two-tier-computing model in which client is considered as data capture and validation tier and Server was considered as data storage tier. This scenario is depicted.
Problems of two-tier architecture
The need of enterprise scalability challenged this traditional two-tier client-server model. In the mid-1990s, as application became more complex and could be deployed to hundreds or thousands of end-users, the client side, now undergoes with following problems: • A’ fat’ client requiring considerable resources on client’s computer to run effectively. This includes disk space, RAM and CPU. • Client machines require administration which results overhead.
By 1995, three-tier architecture appears as improvement over two-tier architecture. It has three layers, which are: • First Layer: User Interface which runs on end-user’s computer (the client) . • Second Layer: Application Server It is a business logic and data processing layer. This middle tier runs on a server which is called as Application Server. • Third Layer: Database Server It is a DBMS, which stores the data required by the middle tier. This tier may run on a separate server called the database server. As, described earlier, the client is now responsible for application’s user interface, thus it requires less computational resources now clients are called as ‘thin client’ and it requires less maintenance.
Advantages of Client/Server Database System
• Client/Server system has less expensive platforms to support applications that had previously been running only on large and expensive mini or mainframe computers • Client offer icon-based menu-driven interface, which is superior to the traditional command-line, dumb terminal interface typical of mini and mainframe computer systems. • Client/Server environment facilitates in more productive work by the users and making better use of existing data. • Client/Server database system is more flexible as compared to the Centralized system. • Response time and throughput is high. • The server (database) machine can be custom-built (tailored) to the DBMS function and thus can provide a better DBMS performance. • The client (application database) might be a personnel workstation, tailored to the needs of the end users and thus able to provide better interfaces, high availability, faster responses and overall improved ease of use to the user. + A single database (on server) can be shared across several distinct client (application) systems.
Disadvantages of Client/Server Database System
• Programming cost is high in client/server environments, particularly in initial phases. • There is a lack of management tools for diagnosis, performance monitoring and tuning and security control, for the DBMS, client and operating systems and networking environments.
A DBMS must provide appropriate languages and interfaces for each category of users to express database queries and updates. Database Languages are used to create and maintain database on computer. There are large numbers of database languages like Oracle, MySQL, MS Access, dBase, FoxPro etc. SQL statements commonly used in Oracle and MS Access can be categorized as data definition language (DDL), data control language (DCL) and data manipulation language (DML).
a)Data Definition Language (DDL):
It is a language that allows the users to define data and their relationship to other types of data. It is mainly used to create files, databases, data dictionary and tables within databases.It is also used to specify the structure of each table, set of associated values with each attribute, integrity constraints, security and authorization information for each table and physical storage structure of each table on disk.
b)Data Manipulation Language (DML):
It is a language that provides a set of operations to support the basic data manipulation operations on the data held in the databases. It allows users to insert, update, delete and retrieve data from the database. The part of DML that involves data retrieval is called a query language.
c)Data Control Language (DCL):
DCL statements control access to data and the database using statements such as GRANT and REVOKE. A privilege can either be granted to a User with the help of GRANT statement. The privileges assigned can be SELECT, ALTER, DELETE, EXECUTE, INSERT, INDEX etc. In addition to granting of privileges, you can also revoke (taken back) it by using REVOKE command.
File Processing System:
File processing systems was an early attempt to computerize the manual filing system that we are all familiar with. A file system is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. File systems may use a storage device such as a hard disk or CD-ROM and involve maintaining the physical location of the files. In our own home, we probably have some sort of filing system, which contains receipts, guarantees, invoices, bank statements, and such like. When we need to look something up, we go to the filing system and search through the system starting from the first entry until we find what we want. Alternatively, we may have an indexing system that helps to locate what we want more quickly. For example we may have divisions in the filing system or separate folders for different types of item that are in some way logically related.The manual filing system works well when the number of items to be stored is small. It even works quite adequately when there are large numbers of items and we have only to store and retrieve them. However, the manual filing system breaks down when we have to cross-reference or process the information in the files. For example, a typical real estate agent’s office might have a separate file for each property for sale or rent, each potential buyer and renter, and each member of staff. Clearly the manual system is inadequate for this’ type of work. The file based system was developed in response to the needs of industry for more efficient data access. In early processing systems, an organization’s information was stored as groups of records in separate files.In the traditional approach, we used to store information in flat files which are maintained by the file system under the operating system’s control. Here, flat files are files containing records having no structured relationship among them. The file handling which we learn under C/C ++ is the example of file processing system. The Application programs written in C/C ++ like programming languages go through the file system to access these flat.
Characteristics of File Processing System:
• It is a group of files storing data of an organization. • Each file is independent from one another. • Each file is called a flat file. • Each file contained and processed information for one specific function, such as accounting or inventory. • Files are designed by using programs written in programming languages such as COBOL, C, C++. • The physical implementation and access procedures are written into database application; therefore, physical changes resulted in intensive rework on the part of the programmer. • As systems became more complex, file processing systems offered little flexibility, presented many limitations, and were difficult to maintain.
Limitations of the File Processing System I File-Based Approach:
1.Separated and Isolated Data: To make a decision, a user might need data from two separate files. First, the files were evaluated by analysts and programmers to determine the specific data required from each file and the relationships between the data and then applications could be written in a programming language to process and extract the needed data. Imagine the work involved if data from several files was needed.
2.Duplication of data: Often the same information is stored in more than one file. Uncontrolled duplication of data is not required for several reasons, such as: • Duplication is wasteful. It costs time and money to enter the data more than once • It takes up additional storage space, again with associated costs. • Duplication can lead to loss of data integrity; in other words the data is no longer consistent. For example, consider the duplication of data between the Payroll and Personnel departments. If a member of staff moves to new house and the change of address is communicated only to Personnel and not to Payroll, the person’s pay slip will be sent to the wrong address. A more serious problem occurs if an employee is promoted with an associated increase in salary. Again, the change is notified to Personnel but the change does not filter through to Payroll. Now, the employee is receiving the wrong salary. When this error is detected, it will take time and effort to resolve. Both these examples, illustrate inconsistencies that may result from the duplication of data. As there is no automatic way for Personnel to update the data in the Payroll files, it is difficult to foresee such inconsistencies arising. Even if Payroll is notified of the changes, it is possible that the data will be entered incorrectly.
3. Data Dependence: In file processing systems, files and records were described by specific physical formats that were coded into the application program by programmers. If the format of a certain record was changed, the code in each file containing that format must be updated. Furthermore, instructions for data storage and access were written into the application’s code. Therefore, .changes in storage structure or access methods could greatly affect the processing or results of an application. In other words, in file based approach application programs are data dependent. It means that, with the change in the physical representation (how the data is physically represented in disk) or access technique (how it is physically accessed) of data, application programs are also affected and needs modification. In other words application programs are dependent on the how the data is physically stored and accessed.
4. Difficulty in representing data from the user’s view: To create useful applications for the user, often data from various files must be combined. In file processing it was difficult to determine relationships between isolated data in order to meet user requirements.
5. Data Inflexibility: Program-data interdependency and data isolation, limited the flexibility of file processing systems in providing users with ad-hoc information requests
6. Incompatible file formats: As the structure of files is embedded in the application programs, the structures are dependent on the application programming language. For example, the structure of a file generated by a COBOL program may be different from the structure of a file generated by a ‘C’ program. The direct incompatibility of such files makes them difficult to process jointly.
7. Data Security. The security of data is low in file based system because, the data is maintained in the flat file(s) is easily accessible. For Example: Consider the Banking System. The Customer Transaction file has details about the total available balance of all customers. A Customer wants information about his account balance. In a file system it is difficult to give the Customer access to only his data in the· file. Thus enforcing security constraints for the entire file or for certain data items are difficult.
8. Transactional Problems. The File based system approach does not satisfy transaction properties like Atomicity, Consistency, Isolation and Durability properties commonly known as ACID properties.
9. Concurrency problems. When multiple users access the same piece of data at same interval of time then it is called as concurrency of the system. When two or more users read the data simultaneously there is ll( problem, but when they like to update a file simultaneously, it may result in a problem.
10. Poor data modeling of real world. The file based system is not able to represent the complex data and interfile relationships, which results poor data modeling properties.
In Entity-Relationship model a database is modeled as a collection of entities and relationship among entities. The ER model views the real world as a construct of entities and association between entities.
An entity is an object whose information is stored in the database. It is distinguishable from other objects. For example: specific person, company, event, plant. In other words, any thing that may ‘have an independent existence and about which we intend to collect data is known as Entity. It is also known as Entity type. Entities are the principal data object about which information is to be collected. Entities are usually recognizable concepts, either concrete or abstract, such as person, places, things, or events, which have relevance to the database. Some specific examples of entities are EMPLOYEES, PROJECTS, and INVOICES. An entity is analogous to a table in the relational model. Entities are classified as independent or dependent (in some methodologies, the terms used are strong and weak, respectively). An independent entity is one that does not rely on another for identification. A dependent entity is one that relies on another for identification. An entity occurrence (also called an instance) is an individual occurrence of an entity. An occurrence is analogous to a row in the relational table.
An entity set is a set of entities of the same type that share the same properties. For example: set of all persons, companies, trees, holidays. In object oriented terminology entity is considered as an object and entity set is considered as a class. For example, Student is an entity set or class and a particular student is an entity or an object.
Special Entity Types
Associative entities (also known as intersection entities) are entities used to associate two or more entities in order to reconcile a many-to-many relationship. Subtypes entities are used in generalization hierarchies to represent a subset of instances of their parent entity, called the super type, but which have attributes or relationships that apply only to the subset. Associative entities and generalization hierarchies are discussed in more detail in next sections.
A Relationship represents an association between two or more entities. Relationships are classified in terms of degree, connectivity, cardinality, and existence. An example of a relationship would be: • Employees are assigned to projects • Projects have subtasks • Departments manage one or more projects
Attributes describe the properties of the entity of which they are associated. A particular instance of an attribute is a value. For example, “Ram” is one value of the attribute Name. The domain of an attribute is the collection of all possible values an attribute can have. The domain of Name is a character string.
We can classify attributes as following types: • Simple • Composite • Single-values • Multi-values • Derived Simple Attribute: A simple attribute is an attribute composed of a single component with an independent existence. Simple attributes cannot be further subdivided. Examples of simple attributes include Sex, Age, and Salary etc. Simple attributes are sometimes called atomic attributes. Composite Attribute: An attribute composed of multiple components, each with an independent existence is called a composite attribute. Some attributes can be further divided to yield smaller components with an independent existence of their own. For example, the Address attribute can be composed of components like Street number, Area, City, Pin code and so on. The decision to model the Address, Area, and City etc. is dependent on whether the user view of the model refers to the Address attribute as a single unit or as individual components. Composite attributes are shown. Single-valued Attribute: A single-valued attribute is one that holds a single value for a single entity. The majority of attributes are single-valued for a particular entity. For example, the Classroom entity has as single value for the room_number attribute and therefore the room_number attribute is referred to as being single-valued. Multi-valued Attribute: A multi-valued attribute is one that holds multiple values for a single entity. Some attribute has multiple values for a particular entity. For example, a student entity can have multiple values for the Hobby attribute-reading, music, movies and so on. A multi-valued attribute may have set of numbers with upper and lower limits. For example, the Hobby attribute of a Student may have between one and five values. In other words, a student may have a minimum of one hobby and maximum of 5 hobbies. Derived Attribute: A derived attribute is one that represents a value that is derivable from the value of a related attribute or set of attributes, not necessarily in the same entity. Some attributes may be related for a particular entity. For example the Age attribute can be derived from the date-of-birth attribute and therefore they are related. We refer the age attribute as a derived attribute, the value of which is derived from the date-of-birth attribute.
Degreeof a Relationship
The degree of a relationship is the number of entities associated with the relationship. The n-ary relationship is the general form for degree n. Special cases are the binary, and ternary, where the degree is 2, and 3, respectively. Binary relationships, the association between two entities are the most common type in the real world. A recursive binary relationship occurs when an entity is related to itself. An example might be “some employees are married to other employees”. A ternary relationship involves three entities and is used when a binary relationship is inadequate. Many modeling approaches recognize only binary relationships. Ternary or n-ary relationships are decomposed into two or more binary relationships.
Connectivity and Cardinality
The connectivity of a relationship describes the mapping of associated entity instances in the relationship. The values of connectivity are “one” or “many”. The cardinality of a relationship is the actual number of related occurrences for each of the two entities. The basic types of connectivity for relations are: • One to One (1: 1). • One to Many (1:M) • Many to One (M:l) • Many to Many (M:M) A one-to-one (1:1) relationship is when at most one instance of an entity A is associated with one instance of entity B. For example, “employees in the company are each assigned their own office. For each employee there exists a unique office and for each office there exists a unique employee. A one-to-many (1:M) relationships is when for one instance of entity A, there are zero, one, or many instances of entity B, but for one instance of entity B, there is only one instance of entity A. Examples of l:M relationships are: • A department has many employees. • Each employee is assigned to one department. A many-to-one (M: 1) relationships is when for one instance of entity A is associated with at most one instances of entity B, but for one instance of entity B, there may be any number of instances of entity A. Examples of M: 1 relationships is • Many employees one department. A many-to-many (M: M) relationship, sometimes called non-specific, is when for one instance of entity A, there are zero, one, or many instances of entity B and for one instance of entity B there are zero, one, or many instances of entity A. Examples are: • Employees can be assigned to no more than two projects at the same time. • Projects must have assigned at least three employees. A single employee can be assigned to many projects; conversely, a single project can have assigned to it many employees. Here, the cardinality for the relationship between employees and projects is two and the cardinality between project and employee is three. Many-to-many relationships cannot be directly translated to relational tables but instead must be transformed into two or more one-to-many relationships using associative entities.
The direction of a relationship indicates the originating entity of a relationship. The entity from which a relationship originates is the parent entity; the entity where the relationship terminates is the child entity. The type of the relation is determined by the direction of line connecting relationship component and the entity. To distinguish different types of relation, we draw either a directed line or an undirected line between the relationship set and the entity set. Directed line is used to indicate one occurrence and undirected line is used to indicate many occurrences in a relation as shown in next case. To illustrate these different types of relationships consider the following entity sets: DEPARTMENT, MANAGER, EMPLOYEE, PROJECT. The relationship between a DEPARTMENT and a MANAGER is usually one-to-one; there is only one manager per department and a manager manages only one department. This relationship between entities is shown in figure 4.3. Each entity is represent by a rectangle and a direct line indicates the relationship between them. The relationship for MANAGER to DEPARTMENT and from DEPARTMENT to MANAGER is both 1:1. Note that a one to one relationship between two entity set does not imply that for an occurrence of an entity from one set at any time there must be an occurrence of an entity in the other set. In the case of an organization, there could be times when a department is without a manager or when an employee who is classified as a manager may be without a department to manage. Some instance of one to one relationship between the entities DEP ARTMENT and MANAGER. A one to many, relationship exists from the entity MANAGER to the entity EMPLOYEE because there are several employees reporting to the manager. As we just pointed out, there could be an occurrence of the entity type MANAGER having zero occurrences of the entity type EMPLOYEE reporting to him or her. A reverse relationship, from EMPLOYEE to MANAGER, would be many to one, since a single manager may supervise many employees However, given an instance of the entity set EMPLOYEE, there could be only one instance of the entity set MANAGER to whom that employee reports (assuming that no employee reports to more than one manager). The relationship between entities is illustrated and figure shows some instances of this relationship. The relationship between the entity EMPLOYEE and the entity PROJECT can be derived as follows: Each employee could be involved in a number of different projects, and a number of employees could be working on a given project. This relationship Between EMPLOYEE and PROJECT is many to many. It is illustrated and shows some instances of such a relationship.
Entity Relationship Diagram:
1. Gathering information: This could be a written document that describes the system in question with reasonable amount of details.
2. Producing ERD: ERD or Entity Relationship Diagram is a diagrammatic representation of the description we have gathered about the system.
3. Designing the database: Out of the ERD we have created, it is very easy to determine the tables, the attributes which the tables must contain and the relationship among these tables.
4. Normalization: This is a process of removing different kinds of impurities from the tables we have just created in the above step.
How to Prepare an ERD
Step 1 Let us take a very simple example and we try to reach a fully organized database from it. Let us look at the following simple statement: A boy eats an ice cream. This is a description of a real word activity, and we may consider the above statement as a written document (very short, of course).
Step 2 Now we have to prepare the ERD. Before doing that we have to process the statement a little. We can see that the sentence contains a subject (boy), an object (ice cream) and a verb (eats) that defines the relationship between the subject and the object. Consider the nouns as entities (boy and ice cream) and the verb (eats) as a relationship. To plot them in the diagram, put the nouns within rectangles and the relationship within a diamond. Also, show the relationship with a directed arrow, starting from the subject entity (boy) towards the object entity (ice cream). Well, fine. Up to this point the ERD shows how boy and ice cream are related. Now, every boy must have a name, address, phone number etc. and every ice cream has a manufacturer, flavor, price etc. Without these the diagram is not complete. These items which we mentioned here are known as attributes, and they must be incorporated in the ERD as connected ovals. But can only entities have attributes? Certainly not. If we want then the relationship must have their attributes too. These attribute do not inform anything more either about the boy or the ice cream, but they provide additional information about the relationships between the boy and the ice cream.
Step 3 We are almost complete now. If you look carefully, we now have defined structures for at least three tables like the following: However, this is still not a working database, because by definition, database should be “collection of related tables.” To make them connected, the tables must have some common attributes. If we chose the attribute Name of the Boy table to play the role of the common attribute, then the revised structure of the above tables become something like the following. This is as complete as it can be. We now have information about the boy, about the ice cream he has eaten and about the date and time when the eating was done.
Cardinality of Relationship While creating relationship between two entities, we may often need to face the cardinality problem. This simply means that how many entities of the first set are related to how many entities of the second set. Cardinality can be of the following three types.
One-to-One Only one entity of the first set is related to only one entity of the second set. E.g. A teacher teaches a student. Only one teacher is teaching only one student. This can be expressed in the followingdiagram as:
One-to-Many Only one entity of the first set is related to multiple entities of the second set. E.g. A teacher teaches students. Only one teacher is teaching many students. This can be expressed in the followingdiagram as:
Many-to-One Multiple entities of the first set are related to multiple entities of the second set. E.g. Teachers teach a student. Many teachers are teaching only one student. This can be expressed in the followingdiagram as:
Many-to-Many Multiple entities of the first set is related to multiple entities of the second set. E.g. Teachers teach students. In any school or college many teachers are teaching many students. This can be consideredas a two way one-to-many relationship. This can be expressed in the following diagram as: In this discussion we have not included the attributes, but you can understand that they can be used without any problem if we want to.
The Concept of Keys
A key is an attribute of a table which helps to identify a row. There can be many different types of keys which are explained here.
Super Key or Candidate Key: It is such an attribute of a table that can uniquely identify a row in a table. Generally they contain unique values and can never contain NULL values. There can be more than one super key or candidate key in a table e.g. within a STUDENT table Roll and Mobile No. can both serve to uniquely identify a student.
Primary Key: It is one of the candidate keys that are chosen to be the identifying key for the entire table. E.g. although there are two candidate keys in the STUDENT table, the college would obviously use Roll as the primary key of the table.
Alternate Key: This is the candidate key which is not chosen as the primary key of the table. They are named so because although not the primary key, they can still identify a row.
Composite Key: Sometimes one key is not enough to uniquely identify a row. E.g. in a single class Roll is enough to find a student, but in the entire school, merely searching by the Roll is not enough, because there could be 10 classes in the school and each one of them may contain a certain roll no 5. To uniquely identify the student we have to say something like “class VII, roll no 5”. So, a combination of two or more attributes is combined to create a unique combination of values, such as Class + Roll.
Foreign Key: Sometimes we may have to work with an attribute that does not have a primary key of its own. To identify its rows, we have to use the primary attribute of a related table. Such a copy of another related table’s primary key is called foreign key.
Strong and Weak Entity Based on the concept of foreign key, there may arise a situation when we have to relate an entity having a primary key of its own and an entity not having a primary key of its own. In such a case, the entity having its own primary key is called a strong entity and the entity not having its own primary key is called a weak entity. Whenever we need to relate a strong and a weak entity together, the ERD would change just a little. Say, for example, we have a statement “A Student lives in a Home.” STUDENT is obviously a strong entity having a primary key Roll. But HOME may not have a unique primary key, as its only attribute Address may be shared by many homes (what if it is a housing estate?). HOME is a weak entity in this case. The ERD of this statement would be like the following As you can see, the weak entity itself and the relationship linking a strong and weak entity must have double border.
Different Types of Database
There are three different types of data base. The difference lies in the organization of the database and the storage structure of the data. We shall briefly mention them here.
This is our subject of study. A DBMS is relational if the data is organized into relations, that is, tables. In RDBMS, all data are stored in the well-known row-column format.
In HDBMS, data is organized in a tree like manner. There is a parent-child relationship among data items and the data model is very suitable for representing one-to-many relationship. To access the data items, some kind of tree-traversal techniques are used, such as preorder traversal. Because HDBMS is built on the one-to-many model, we have to face a little bit of difficulty to organize a hierarchical database into row column format. For example, consider the following hierarchical database that shows four employees (E01, E02, E03, and E04) belonging to the same department D1. There are two ways to represent the above one-to-many information into a relation that is built in one-to-one relationship. The first is called Replication, where the department id is replicated a number of times in the table like the following. Replication makes the same data item redundant and is an inefficient way to store data. A better way is to use a technique called the Virtual Record. While using this, the repeating data item is not used in the table. It is kept at a separate place. The table, instead of containing the repeating information, contains a pointer to that place where the data item is stored. This organization saves a lot of space as data is not made redundant. Network DBMS The NDBMS is built primarily on a one–to-many relationship, but where a parent-child representation among the data items cannot be ensured. This may happen in any real world situation where any entity can be linked to any entity. The NDBMS was proposed by a group of theorists known as the Database Task Group (DBTG). What they said looks like this: In NDBMS, all entities are called Records and all relationships are called Sets. The record from where the relationship starts is called the Owner Record and where it ends is called Member Record. The relationship or set is strictly one-to-many. In case we need to represent a many-to-many relationship, an interesting thing happens. In NDBMS, Owner and Member can only have one-to-many relationship. We have to introduce a third common record with which both the Owner and Member can have one-to-many relationship. Using this common record, the Owner and Member can be linked by a many-to-many relationship. Suppose we have to represent the statement Teachers teach students. We have to introduce a third record, suppose CLASS to which both teacher and the student can have a many-to-many relationship. Using the class in the middle, teacher and student can be linked to a virtual many-to many relationship.