April 2, 2015

Database

I wrote my notes while studying for database design in markdown so I thought I might as well share it here.

Based on the slides publicly available from Database System Concepts Chapter 1-4


Database Notes

Chapter 1 - Intro to Databases

DBMS

  • Set of programs to access data
  • Environment both convenient and efficient to use

Drawbacks of filesystems as DB

  • Data redundancy/duplication and inconsistency of info
  • Difficult to access data
  • Data isolation
    • Data in multiple files and formats
  • Integrity problems
    • Constraints in code instead of being stated explicitly
    • Hard to add new constraints or change existing ones
  • No atomicity
    • Partial updates may be carried out in event of failure
  • Concurrent access by multiple users
    • For performance
    • Uncontrolled concurrent access leads to inconsistencies
  • Security and access control issues

DBMS uses ACID principle

  • Atomicity
    • If one part of transaction fails, entire transaction should fail
  • Consistency
    • Any transaction can only change the database in allowed ways
  • Isolation
    • Concurrent transactions result in system state identical to if transactions were executed serially
  • Durability
    • Results of a transaction are stored permanently even in the event of power loss, crashes or errors

DBMS Levels of abstraction

  • Physical level
    • Describes how a record is stored on the physical disk
  • Logical level
    • Describes how data is stored in a database and the relationships between data
    • SQL create table statement
  • View level
    • Applications hide details of data types
    • Hides information for security purposes

Instances and Schemas

  • Physical Data Independence
    • Ability to modify the physical level of database design without changing the logical schema
    • Applications depend on logical level of database design (schema)

Relational data model

  • Tabular data with rows (records) and columns (fields)
  • Data can have relationships with other tables’ data

Database language

  • Data Defnition Language (DDL)
    • Language for defining database schema
    • Create
    • Includes integrity constraints like primary key and referential intergrity
  • Data Manipulation Language (DML)
    • Language for manipulating data
    • Insert / Delete / Update
    • Procedural
      • User specifies what data is required and how to get those data
    • Non-procedural (Declarative)
      • User specifies what data is required
      • SQL
  • Query Language
    • Language for accessing data
    • Select
  • SQL has all three

Database Design

  • If values in a field are being repeated many times, a new table should be created with a relation back to the original table

Chapter 2 - Intro to Relational Model

Relation

  • Attributes / Fields / Columns
    • Domains are the set of allowed values for each attribute
    • null is a member of every domain
  • Example of relation
    • instructor = (ID, name, dept_name, salary)

Database

  • Consists of multiple relations
  • Information about an enterprise broken up into parts
  • Bad design eg. univ = (instructor_ID, name, dept_name, salary, student_ID, …) results in:
    • Repetition of information (eg. two students with the same instructor)
    • Need for null values (eg. student with no advisor)

Keys

  • K is a superkey of R if values of K can uniquely identify a tuple (record) of each possible relation r(R)
    • eg. {ID} and {ID,name} are superkeys of instructor
  • Superkey K is a candidate key if K is minimal
    • eg. {ID} is a candidate key
  • Candidate key chosen by database designer as principal means of identifying tuples within a relation is a primary key
    • eg. {ID} chosen as primary key
  • Foreign key constraint / integrity means that the value in one relation must appear in another as the referencing / referenced relation

Relational Query Language

  • Selection of tuples / records / rows
    • Select a subset of all rows
  • Selection of columns / attributes / fields
    • Select a subset of all columns
    • Discard duplicate values
  • Joining two relations - Cartesian Product
    • For every row in r
      • For every row in s
        • Create new row in r x s with all columns from both relations
  • Union of two relations
  • Set difference of two relations
    • r exclude s
  • Set intersection of two relations
  • Joining two relations - Natural Join
    • For every row in s
      • If row in s has values in the same columns as a row in s, create new row with matched columns and append other existing columns’ values

Chapter 3 - SQL

SQL Data Definition Language (DDL)

Create Table

create table instructor(
    ID          char(5),
    name        varchar(20) not null,
    dept_name   varchar(20),
    salary      numeric(8,2),
    primary key(ID),
    foreign key(dept_name) references department);

not null is an integrity constraint, name cannot take a null value

create table takes(
    ID          varchar(5),
    course_id   varchar(8),
    sec_id      varchar(8),
    semester    varchar(6),
    year        numeric(4,0),
    grade       varchar(2),
    primary key(ID, course_id, sec_id, semester, year),
    foreign key(ID) references student,
    foreign key(course_id, sec_id, semester, year) references section);

Since sec_id is in the primary key, a student can be registered for two sections of the same course in the same semester and year

Drop

drop table student Deletes entire table and contents

Alter

alter table student add name varchar(20) alter table student drop name Dropping of attributes not usually supported

SQL Data Manipulation Language (DML)

Basic query structure

select [attribute 1], [attribute 2], ...
from [relation 1], [relation 2], ...
where [predicate]

Result of an SQL query is a relation

Select

select name from instructor Lists the attributes desired in the result of a query, names are case-insensitive

select distinct dept_name from instructor Removes duplicates; the keyword all instead of distinct will specify instead that duplicates not be removed

select * from instructor * denotes all attributes

select ID, name, salary/12 from instructor Arithmetic expressions allowed in select; in this case salary is divided by 12 when returned

Where

Specifies conditions the result must satisfy (selection predicate)

select name from instructor where dept_name = 'Comp. Sci.' and salary > 80000 Logical connectives and, or, not can be combined with comparisons

From

Lists relations involved in the query; creates a Cartesian product of all the relations involved

select * from instructor, teaches Returns every possible instructor-teaches pair with all attributes from both relations

Joins

select section.course_id, semester, year, title from section, course where section.course_id = course.course_id and dept_name = 'Comp. Sci.'

Natural Joins

select * from instructor natural join teaches

Beware of unrelated attributes with the same name getting equated Incorrect: select name, title from instructor natural join teaches natural join course Correct: select name, title from (instructor natural join teaches) join course using (course_id)

Rename

select ID, name, salary monthly_salary from instructor Refers to salary as monthly_salary instead

select distinct T.name from instructor T, instructor S where T.salary > S.salary and S.dept_name = 'Comp. Sci.' Find names of all instructors who have higher salary than some instructor in ‘Comp. Sci.’; Self join to compare records in same table

Like

select name from instructor where name like '%da\%r_' escape '\' % matches any substring, _ matches any character, escape defines an escape character

Order by

select distinct name from instructor order by dept_name, name desc

Between

select name from instructor where salary between 90000 and 100000

Set operations

(select course_id from section where sem='Fall') union (select course_id from section where sem='Spring') Set operations union, intersect and except automatically eliminate duplicates, to retain them use union all, intersect all and except all

Null

select name from instructor where salary is null null signifies an unknown value or a value that does not exist

Aggregate Functions

select avg(salary) from instructor where dept_name = 'Comp. Sci.' Aggregate functions avg, count, min, max, sum return a value; all aggregate functions exclude null values except count(*) which counts the number of records in a table

select dept_name, avg(salary) from instructor group by dept_name Resulting relation has columns dept_name and avg_salary with each row being the average salary of instructors for that department

select dept_name, avg(salary) from instructor group by dept_name having avg(salary) > 42000 Predicates in the having clause are applied after the formation of groups while predicates in the where clause are applied before forming groups

Nested Subqueries

Intersection

select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and 
        course_id in (select course_id
                    from section
                    where semester = ’Spring’ and year= 2010);

Exclusion

select distinct course_id
from section
where semester = ’Fall’ and year= 2009 and 
        course_id not in (select course_id
                        from section
                        where semester = ’Spring’ and year= 2010);

Comparing with some (at least one) records in another query

select name
from instructor
where salary > some (select salary
                    from instructor
                    where dept_name = ’Biology’);

Comparing with all records in another query

select name
from instructor
where salary > all (select salary
                    from instructor
                    where dept_name = ’Biology’);

Check if subquery is nonempty

select course_id
from section as S
where semester = ’Fall’ and year= 2009 and 
        exists (select *
                from section as T
                where semester = ’Spring’ and year= 2010 
                        and S.course_id= T.course_id);

Check if subquery is empty

select distinct S.ID, S.name
from student as S
where not exists ( (select course_id
                    from course
                    where dept_name = ’Biology’)
                except
                    (select T.course_id
                    from takes as T
                    where S.ID = T.ID));

Tests for absence of duplicate tuples in result; returns true on empty set

select T.course_id
from course as T
where unique (select R.course_id
            from section as R
            where T.course_id = R.course_id
                    and R.year = 2009);
Subqueries in From clause
select dept_name, avg_salary
from (select dept_name, avg (salary) as avg_salary
        from instructor
        group by dept_name)
where avg_salary > 42000;

With

Defines a temporary view (relation) only available to the query where with occurs; in this case, max_budget is the temporary table

with max_budget(value) as
    (select max(budget)
    from department)
select budget
from department, max_budget
where department.budget=max_budget.value;

Delete

delete from instructor where dept_name = 'Finance'

delete from instructor where salary > (select avg(salary) from instructor) avg is computed for every tuple, as delete runs, avg changes; compute avg first then delete

Insert

insert into instructor values('10211', 'Smith', 'Biology', 66000); Follows the order of fields declared in the create statement

insert into instructor (ID, name, dept_name, salary) values('10211', 'Smith', 'Biology', 66000); If cannot remember order of the attributes

insert into student select ID, name, dept_name, 0 from instructor Inserts all instructors into the student relation with tot_creds set to 0

Update

update instructor set salary = salary * 1.03 where salary > 100000; Increases instructors’ salaries whose salary is above 100000 by 1.03 times

update instructor
set salary = case
                when salary <= 100000 then salary * 1.05
                else salary * 1.03
            end

Increases instructors’ salaries depending on their salary


Chapter 4 - Intermediate SQL

Outer join

Similar to join but uses null values for tuples that to not match tuples in the other relation; avoids loss of information course left outer join prereq on course.course_id = prereq.course_id course natural right outer join prereq course natural full outer join prereq

Joined relations

  • Join condition
    • Defines which tuples in the two relations match and which attributes are in the result of the join
    • inner join, left outer join, right outer join, full outer join
  • Join type
    • Defines how tuples in each relation that do not match any tuple in the other relation are treated
    • natural, on <predicate\>, using (attributes)

Views

create view v as <query expression> View v is a virtual relation not a new relation, the expression is saved into view v and substituted into queries

create view faculty as
    select ID, name, dept_name
    from instructor
create view biology_faculty as
    select ID, name
    from faculty
    where dept_name = 'Biology'

Views can be defined from other views

Integrity constraints

Unique

unique (attributes) States that attributes form a candidate key

Check

check (predicate) Ensures predicate is satisfied

create table section (
    course_id varchar (8),
    sec_id varchar (8),
    semester varchar (6),
    year numeric (4,0),
    building varchar (15),
    room_number varchar (7),
    time slot id varchar (4), 
    primary key (course_id, sec_id, semester, year),
    check (semester in (’Fall’, ’Winter’, ’Spring’, ’Summer’))
); 

Referential integrity

  • Ensures value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation
    • eg. if “Biology” appears in one of the tuples in instructor, then there should exist a tuple in department for “Biology”
    • dept_name is then a foreign key of instructor and a primary key of department
Cascade
create table course (
    course_id varchar(8)
    ...
    dept_name varchar(20),
    foreign key (dept_name) references department (dept_name)
        on delete cascade
        on update cascade (or set null),
    ... );

Instead of simply rejecting a delete of Biology department, on delete cascade deletes all courses the Biology department offers and on update cascade updates fields accordingly; alternatives for cascade are set null and set default

Deferrable

create table person (
    ID char(10),
    name char(40),
    mother char(10),
    father  char(10),
    primary key ID,
    foreign key father references person(ID) deferrable,
    foreign key mother references  person(ID) deferrable)

Defers constraint checking until the end of a transaction

Default

create table student (
    ID varchar (5),
    name varchar (20) not null,
    dept_name varchar (20),
    tot_cred numeric (3,0) default 0,
    primary key (ID) )

Provides default value if no value is given

Index creation

create index studentID_index on student(ID) Data structure used to speed up access to records, queries which specify a value for student.ID can now be executed by using the index without looking at all records of student

Built-in data types in SQL

  • date
    • date '2015-04-01'
  • time
    • time '17:00:30'
    • time '17:00:30.75'
  • timestamp
    • timestamp '2015-04-01 17:00:30.75'
  • interval
    • interval '1' day
    • interval '3-2' year to month
    • Subtraction between above 3 types gives an interval
    • interval can be added to above 3 types

User-defined types

create type Dollars as numeric(12,2)

create table department
    (dept_name varchar (20),
    building varchar (15),
    budget Dollars);

Domains

create domain degree_level varchar(10)
    check(value in ('Bachelors', 'Masters', 'Doctorate'));
  • Domains can have constraints such as not null but types cannot
  • Domains are not strongly typed and a value of one type can be transferred to one of another type

Large object types

  • For pictures, videos, CAD files etc.
  • blob: binary large object whose interpretation is left to an application outside of the database system
  • clob: character large object is a large collection of character data
create table pictures(
    name varchar(12),
    pic blob(32MB));
select pic
from pictures
where name like '%SNU%';
- ksami
Tags: , ,