Software Engineering
#software development activities required a systematic and disciplined approach i.e. Engineering Approach.
#Quality is not a feature which can be added to software after creating it! Quality must be built into software from the first day of development.
#SDLC models ---
Waterfall model
Incremental/Iterative model
Spiral model
#Requirement analysis
Activities:
Define the scope of the work
Understand processes
Focus on what needs to be automated
Understand functional and non-functional requirements
Document and Verify Requirements
Deliverables:
Software Requirement Specification(SRS)
Acceptance Test Plan
System Test Plan
#Functional Requirements:
Number of passengers booked per transaction should be max 6
SMS confirmation has to be sent to the registered mobile number on completing a successful transaction
Non Functional Requirements:
Time taken to complete a transaction should not be more than 3 minutes
1000 users should be able to use the application simultaneously
#waterfall model - design
Activities:
Create blueprint of the software
Define structure and behaviour of modules
Deliverables:
High Level Design
Detailed Design
Integration Test Plan
Unit Test Plan
#Coding
Activities:
Build the software
Unit test
Deliverables:
Unit tested code
#Configuration Management is the task of tracking and controlling changes in the software. It is achieved through version control tools like git, svn etc.
Configuration Management ensures:
We are using the right version
Deliverables are consistent with each other
Activity -
Version COntrol
Simultaneous USage
Change COntrol
Backup
# A class without any instance variable also follows Abstraction design principle
#Testing -
Activities:
Ensure that requirements are met
Deliverables:
Integrated and System tested software
#
Unit testing = developers
Integration testing = Development team and testing team
System Testing = Test team
Acceptance Testing = Customer
#We write "Unit A Driver" to test "Unit B" and "Unit C"
We write "Unit B stub" to test "Unit A"
#Regression testing =
While doing changes programmers might end up introducing some new defects in the software. Testing and fixing such defects is known as Regression testing.
In the maintenance phase the product must be tested against previous test cases.
#Performance Testing =
1] Load testing == Maximum operating capacity
2] Endurance testing == sustain continuous expected load
3] Stress Testing == determine a breaking point of system (stability) (expected situation)
4] Spike testing == ability to handle dramatic changes of load (unexpected situation)
#Deployment -
Activities:
Assemble
Install
End-user Training and
Sign-off
Deliverables:
Implemented Software
User manual
#Waterfall model is not suitable for:
Small projects
Complex projects
Accommodating changes (Correct Answer)
Maintenance Projects
#TOp down testing uses Stubs
#Agile software development is a group of software development methods in which requirements and solutions evolve through collaboration between self-organizing, cross-functional teams. It promotes adaptive planning, evolutionary development, early delivery, continuous improvement, and encourages rapid and flexible response to change.
#Sacrum Cycle ==
Product Backlog: The product backlog comprises an ordered list of requirements that a scrum team maintains for a product. It consists of features, bug fixes, functional, non functional requirements, change requests, enhancements etc.
Sprint Backlog: The sprint backlog is the list of tasks the development team must address during the upcoming sprint.
Sprint: A sprint (or iteration) is the basic unit of development in scrum. The sprint is a timeboxed effort; that is, it is restricted to a specific duration.
Working increment of software: The increment (or potentially shippable increment, PSI) is the sum of all the product backlog items completed during a sprint and all previous sprints.
#Sprint is time boxed development period in Scrum.
Each sprint starts with a sprint planning: event that identify the work for the sprint, and make an estimated commitment for the sprint goal.
Each sprint produces the planned increment of the product: outcome could be a working code.
Inspect: the client performs the acceptance testing during inspection and provides feedback.
Adapt: the team incorporates the feedback provided by the client to improve the work during subsequent sprint.
CYCLE = PLAN > PRODUCE > INSPECT > ADAPT
#Agile adoption results in frequent delivery of features to end users. Thus testing has to be performed more often. It also increases the probability of defects through manual errors. Most often there is no time to manually test the entire application as releases could often be only weeks apart.
Test automation is a key strategy to achieve faster delivery cycles. Agile teams use tools like xUnit, QTP, Selenium etc. to automate Unit testing, Functional testing, Regression testing etc. Test cases are written as code using these frameworks and can be executed at every build to detect bugs.
Another practice usually employed in agile projects is Continuous Integration. It essentially means that developers integrate their code to a common region frequently (often daily or multiple times a day). It's main aim is to detect and fix integration problems at an early stage.
#
Traditional Methods
Agile Methods
Predictive
Adaptive
Requirement changes are resisted Changes are welcome
Limited Interaction with Business/ Product and Testing Collaboration between various teams
Single Delivery
Multiple Short / Small Deliveries
Process and Documentation Focus
Communication Focus
#Advantages of Agile ==
1] Time to Market: Early and regular releases
2] Revenue: Early Return On Investment (ROI)
3] High Quality and Productivity: Testing is integrated throughout the cycle
4] Business Engagement/Customer Satisfaction: Active involvement of a business owner and the high visibility of the product and progress
5] Motivated Teams: Active involvement & collaboration make Agile development more enjoyable
#Which is more important, the Product or the Process?
--Process decides the Quality of the Product. Therefore process is important.
#Quality Control
Focus is on the Product
QC measures a product against the existence of a required attribute
Major QC activity is identifying defects and correcting them (Rework)
Inspections, Reviews and Testing activities
Quality Assurance
Focus is on the Process rather than Product
QA ensures "Fitness for Purpose"
Auditing and Reporting functions
Building process guidelines, checklists, templates, Training activities
#Cost of Quality (CoQ) is a measure of costs (effort) incurred in ensuring quality. These costs include:
Prevention cost such as training, process improvement etc.
Appraisal cost such as reviews, testing etc.
Correction/Rework/Failure cost such as rework to fix defects after testing etc.
CoQ is typically measured as percentage of effort spent to ensure quality compared to total project effort. Thus CoQ of 31.38% means that compared to overall effort, the team has spent 31.38% of effort to ensure quality.
CoQ = ((Prevention cost + Appraisal cost + Failure cost) x 100) / Total Project effort
Appraisal Cost, Failure Cost, Prevention Cost and Total Project Effort are all measured in person months.
#Ensures all requirements are implemented in the software. Requirement Traceability Matrix (RTM) maintains reference between: (Eg - page 60)
Requirement Number
Requirement Description
HLD Reference and DD Reference
Unit Test Plan Reference, Integration Test Plan Reference
System Test Plan Reference
Horizontal traceability ensures all requirements are incorporated into product.
Vertical traceability ensures no unnecessary functionality is included unless specifically called for by a requirement.
#defect classification -
1] logical
2] User Interface
3] Maintainability
4] Standards
#If a requirement defect slips from requirements stage unnoticed, and gets caught during later phases, it becomes much more costlier to fix them.
#Pareto principle states that, for many events, roughly 80% of the effects come from 20% of the causes. In software development context 80% of the defects are due to 20% of the causes.
#Basic MAtrics at Infosys:
1] Efforts
2] Defects
3] Schedule
4] Size
#Other derived Metrics used at Infosys
Defect Injection Rate
Defect Removal Efficiency
Review Effectiveness
Turn Around Time
#Process Infrastructure:
1] PRIDE
2] IPM+
3] ENCORE
DataBase
#Reference =
http://ilp/fp2/Generic/DBMS/page/78
http://ilp/fp2/Generic/DBMS/page/99
#
--CREATE TABLE Department(D_id number(3))
--SELECT * FROM user_tables
--SELECT * FROM TAB
--ALTER TABLE department ADD d_name varchar2(30)
--DROP TABLE department
--DESC department
--alter table Department add constraint Dept_UK Unique(Dept_Name)
--alter table Department rename constraint <old_name> to <new_name>
--SELECT EName, Salary * 2 AS Double_Salary FROM Employee
--SELECT EName, 30 AS Value FROM Employee
--SELECT DISTINCT Dept FROM Employee (NULL is also distinct)
--SELECT DISTINCT Dept, Manager FROM EMPLOYEE (for multiple columns distinct)
--SELECT ID, ENAME FROM Employee WHERE SALARY BETWEEN 30000 and 50000
--UPDATE Employee SET SALARY = SALARY * 1.3, BONUS = SALARY * 0.30 WHERE ID = 1
--TRUNCATE table Employee
--DELETE FROM Employee
--DELETE FROM Employee WHERE Id = 5
--select empid, salary AS "Current Salary", (salary * 1.1) AS "New Salary", (salary * 0.1) AS "Incremented Amount" from empdetails
--SELECT City, Country, CONCAT(City, Country) "CONCAT", City || Country "ConcatByOperator", CONCAT(CONCAT(City, ', '), Country) "NestedConcat" FROM Weather;
--SELECT RecordDate, SUBSTR(RecordDate,1,2) "DAY", SUBSTR(RecordDate,4,3) "MONTH", SUBSTR(RecordDate,8) "YEAR" FROM Weather;
--SELECT MinTemp, TO_CHAR(MinTemp) DEF_FORMAT, TO_CHAR(MinTemp, '999.99') "FIXED_DIGITS", TO_CHAR(MinTemp, '9,9.99') "COMMA" FROM Weather;
--SELECT '1000.98' "ORIG_NOFORMAT", TO_NUMBER('1000.98') "CONV_NOFORMAT", '1,000.98' "ORIG_FORMAT", TO_NUMBER('1,000.98', '9,999.99') "CONV_FORMAT" FROM DUAL;
--SELECT RecordDate, TO_CHAR(RecordDate, 'MON') "MONTH", TO_CHAR(RecordDate, 'Month') "FULL_MONTH", TO_CHAR(RecordDate, 'Dy') "DAY", TO_CHAR(RecordDate, 'Day') "FULL_DAY" FROM Weather;
--SELECT TO_CHAR(RecordDate) DEF_FORMAT, TO_CHAR(RecordDate, 'DD/MM/CCYY') INDIAN, TO_CHAR(RecordDate, 'MM/DD/YY') AMERICAN FROM Weather;
--SELECT '01-Jan-2014' DATE_STRING, TO_DATE('01-Jan-2014') CONV_NOFORMAT, TO_DATE('01-Jan-2014', 'DD-Mon-YYYY') CONV_FORMAT FROM DUAL
--SELECT 'Jan-01-2014' DATE_STRING, TO_DATE('Jan-01-2014', 'Mon-DD-YYYY') CONV_FORMAT FROM DUAL
--SELECT COUNT(Dept) Count1, COUNT(DISTINCT Dept) Count2 FROM Employee
--SELECT CITY, NVL(CITY, 'Not Available') NVL_CITY, MINTEMP, NVL(MINTEMP, 0.0) NVL_MINTEMP, NVL(TO_CHAR(MINTEMP), 'Not Available') NVL_MINTEMP2 from Weather;
--select count(*) sale_count from sale where add_months(sldate, 40) > sysdate #query to display the number of sales that were made in the last 40 months
--insert all
into Student values(20, 'Amit')
into Student values(21, 'Nik')
into Student values(22, 'Raj')
--SELECT Id, EName, Designation, Salary,
CASE Designation
WHEN 'SE' THEN Salary * 1.2
WHEN 'SSE' THEN Salary * 1.1
ELSE Salary * 1.05
END New_Salary
FROM Employee;
--SELECT EName, Designation, Bonus,
CASE
WHEN Designation = 'SE' THEN Bonus + 500
WHEN Designation = 'SSE' THEN Bonus + 1000
WHEN Designation = 'PM' THEN Bonus + 2000
ELSE Bonus
END AS NewBonus
FROM Employee;
--SELECT Id, EName, Designation, Salary,
CASE
WHEN Designation = 'SE' OR Designation = 'SSE' THEN TO_CHAR(Salary * 1.2)
WHEN Designation = 'PM' AND Salary >= 90000 THEN 'No hike'
ELSE TO_CHAR(Salary * 1.05 )
END New_Salary
FROM Employee;
--Select ID, ENAME, DOJ, SALARY, DEPT, DESIGNATION FROM Employee ORDER BY DEPT ASC, DESIGNATION DESC
--select to_char(sldate, 'Month') MONTH, count(*) NUmber_sale from sale group by to_char(sldate, 'Month') order by 2 desc
--select prodid, sum(quantity) qty_sold from saledetail where quantity > 1 group by prodid having count(*) > 1
--select distinct Itemtype,
case
when price between 0 and 499 then 'Cheap'
when price between 500 and 1999 then 'Affordable'
when price between 2000 and 4999 then 'Expensive'
when price >= 5000 then 'Very Expensive'
end as classification
from item
order by 1,2
--select to_char(to_date('Jan/10/2015', 'Mon/DD/YYYY'), 'Month') month, to_number('2,50,000.00', '9,99,999.99') amount from dual
--select pymtmode, count(*) pymtcount from orders where to_number(to_char(pymtdate, 'YY'),'99') < 15 group by pymtmode having count(*) > 1
--select to_char(sysdate, 'Mon/DD/YYYY Day') currentdate from dual
--select orderid, round(abs(months_between(orderdate, pymtdate)* 30)) noofdays from orders
--select count(distinct itemtype) noofitemtypes from item
#A file-based approach suffers following problems:
1] Dependency of program on physical structure of data
2] Complex process to retrieve data
3] Loss of data on concurrent access
4] Inability to give access based on record (Security)
5] Data redundancy
#Database is a shared collection of logically related data
Database Management System is a software system that enables users to define, create, maintain, and control access to the database.
Application Program interacts with a database by issuing an appropriate request
#DBMS offers -
Data Management, Integrity, Transaction, Concurrency, Security, Recovery and Utilities
#Types of DataBase System:
1] Hierarchical
2] Network
3] Relational
4] NoSQL
#Tables = Relations
Attributes = Columns = Fields
Rows = Records = Tuples
Number of Columns = Degree of Relation
Number of Records = Cardinality of Relation
#A relationship among a set of values in a table is represented by a "ROW"
#Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle.
Types of Integrity Constraints:
1] Entity Integrity (Primary Key) (between rows)
2] Domain Integrity (DATA TYPES, CHECK CONSTRAINT) (between columns)
3] Referential Integrity (Foreign Key) (between tables)
#When two or more columns together identify the unique row then it's referred to as Composite Primary Key.
#Terms of Foreign KEys:
Parent Table = Master table = Referenced table
Child table = Referencing table
#ER model is a graphical representation of entities and their relationships which helps in understanding data independent of the actual database implementation.
Entity = Real world Object
Attribute = Property of Entity
#Crow foot notation is one of the ways to represent cardinality of relationship in an ER Model.
#In 1 : N relationships, the foreign key and relationship attributes are always added to the many (N) side of the relationship.
#An attribute in a table/relation can reference another attribute in the same table/relation and this is called Self Referencing Foreign Key
#CHAR = CHARACTER
VARCHAR2 = CHARACTER VARYING
#SQL supports SMALLINT, INTEGER and INT data types that are used for storing whole numbers.
SQL, unlike programming languages, does not provide support for arbitrary length numbers i.e. numbers not bound by size limits. For e.g. Python supports "bignum" and Java supports "BigInteger" data types.
#Nonintegral data types have an integer part and a fractional part. Either NUMERIC, DECIMAL or NUMBER data types can be used to store nonintegral numbers.
#SQL supports following data types for representing date and large objects:
1] DATE (default format = DD-MON-YY )
2] TIMESTAMP (Storing date data with precision up-to 1 billionth (9 digits) of a second.)
3] CLOB (Character Large Object) (More than 4000 bytes data)
4] BLOB (Binary Large Object) (movies, images with size up to 4GB)
#Not equal to = <>
#Various constraints that can be created on database tables are:
NOT NULL
PRIMARY KEY
CHECK
UNIQUE
FOREIGN KEY
We can also specify DEFAULT value for a column. Oracle database does not consider DEFAULT as a constraint.
#CREATE TABLE Student (
StudentId INTEGER CONSTRAINT stud_sid_pk PRIMARY KEY,
FName VARCHAR2(10) CONSTRAINT Stud_Fname_nn NOT NULL,
LName VARCHAR2(10),
Gender CHAR(1) CONSTRAINT Stud_gender_ck1 CHECK(Gender IN('M', 'F')),
ContactNo NUMBER(10) CONSTRAINT Stud_cno_uk UNIQUE,
DOJ DATE DEFAULT SYSDATE);
--CONSTRAINT Stud_gender_ck1 CHECK(Gender IN('M', 'F')));
#Column level constraints :
CREATE TABLE Student (
StudentId INTEGER CONSTRAINT stud_sid_pk PRIMARY KEY,
FName VARCHAR2(10),
ContactNo NUMBER(10));
Table level constraints :
CREATE TABLE Student (
StudentId INTEGER,
FName VARCHAR2(10),
ContactNo NUMBER(10),
CONSTRAINT stud_sid_pk PRIMARY KEY (StudentId));
#NOT NULL can be defined only at the column level
Unlike Primary Key, UNIQUE constraint allows NULL values. A table can have many UNIQUE constraints.
Referential integrity constraint allows NULL value
#CREATE TABLE Marks(
CourseId INTEGER,
StudentId INTEGER CONSTRAINT marks_sid_fk REFERENCES Student(StudentId),
MarksScored DECIMAL(5,2));
CREATE TABLE Marks(
CourseId INTEGER,
StudentId INTEGER,
MarksScored DECIMAL(5,2),
CONSTRAINT marks_sid_fk FOREIGN KEY (StudentId) REFERENCES Student(StudentId));
#The data type of the column can be modified with the MODIFY clause. The size of the data type can be increased or decreased. The column should be empty for decreasing the size or for changing the data type from one type to another.
#The column dropped cannot be recovered back. At least one column should be present in the table after dropping the columns.
#Multiple rows can be inserted through a single INSERT statement only when it is used with SELECT statement.
#DISTINCT =
DISTINCT removes duplicates based on all the columns in the SELECT clause
Usage of DISTINCT should be avoided as far as possible due to performance issues
#Equal to operator cannot be used to check for NULL values (eg. - where bonus = NULL)
Trailing spaces are ignored for CHAR data type.
Leading spaces are not ignored for CHAR data type.
Trailing spaces are not ignored for VARCHAR2 data type.
Leading spaces are not ignored for VARCHAR2 data type.
#varchar will reserve space for null values while varchar2 will not reserve any space for null values.
#"Frank John's Wicked Grave Haunts Several Dull Owls" mnemonics for execution
#(date1 < date2) find answer? (past is smaller)
#Which of the following columns in a table cannot be updated?
A primary key column which also serves as foreign key reference in another table.
#TRUNCATE statement deletes all rows from the table as it does not support WHERE clause. TRUNCATE statement is a faster option compared to DELETE when you have to delete all rows from the table.
#Status Code:
ORA-00000
Successful Completion
ORA-00001
Unique constraint violation
ORA-00904
Invalid Identifier
ORA-0913
Too Many Values
#BETWEEN operator can be replaced by AND.
IN operator can be replaced by OR.
#There are two columns in every table - Rownum and Sysdate
#single row function produces one row of output for each row of input
multi row function produces just one row of output, irrespective of the number of rows in input
Numeric functions are single row functions that accept a numeric value and return numeric output. -
ABS, ROUND, FLOOR, CEIL
Character functions work on character strings and can return a character string or a numeric value. -
UPPER, LOWER, CONCAT, LENGTH
Substring function is used to extract part of a string. It has the following syntax
SUBSTR(value, start_position, length)
Use conversion functions to convert data from one format to another. -
TO_CHAR(value, format) ; TO_DATE (value, format) ; TO_NUMBER (value, format)
DATE Functions - SYSDATE, SYSTIMESTAMP, ADD_MONTHS(date, no_of_months), MONTHS_BETWEEN(date1, date2)
Aggregate functions operate on multiple rows to return a single row. Some aggregate functions like SUM (total), AVG (average) operates only on numeric columns while others like MIN (lowest value), MAX (highest value) and COUNT (number of rows) operate on all data types. All aggregate functions ignore NULL values except COUNT(*).
NVL(value1, value2)
: Substitutes value1 by value2 if value1 is NULL. The data type of value1 and value2 must be same.
USER
:
Returns the current logged in user
NVL2(v1,v2,v3)
NULLIF(v1,v2)
#MIN, MAX, COUNT can be applied on both number and character typed columns.
#CASE Statement can also be used with WHERE, GROUP BY etc. CASE statement has two different syntax styles: Simple CASE expression and Searched CASE expression.
Simple CASE expression can be used when all conditions check for equality against a single column.
Searched CASE expression is used to compare different conditions.
NOTE: ELSE is optional. If ELSE is omitted and no condition is true, then NULL is returned by CASE.
#Column position in the query can be used as an alternative to column name in ORDER BY clause
Select ID, ENAME, DOJ, SALARY, DEPT, DESIGNATION FROM Employee ORDER BY 2
#Having allows aggregate functions to be used as filter criteria which cannot be done using WHERE clause.
#GROUP BY must always be after the WHERE clause otherwise aggregate functions will be calculated wrongly.
Having must always be after Group by as it filters records based on aggregate functions calculated during GROUP BY evaluation
#Nested aggregate function cannot be used in SELECT clause without GROUP BY clause.
Columns specified in SELECT clause must be part of the GROUP BY clause.
a GROUP BY clause is mandatory in SELECT query having one aggregate function along with other columns
A query that has a nested aggregate function must have a GROUP BY clause
#Use UNION and UNION ALL clause to combine results from two or more SELECT statements. The select statements may be from same or different tables.They must have same number of columns and their data types at same position in both the query must be compatible (either same or convertible through automatic conversion).
JOINS
--SELECT ID, ENAME, E.COMPID AS E_COMPID, C.COMPID AS C_COMPID, MODEL FROM Employee E INNER JOIN Computer C ON E.COMPID = C.COMPID
--SELECT E.Id, E.ENAME FROM Employee E INNER JOIN Employee J ON E.Dept = J.Dept AND E.Id <> J.Id AND J.ENAME='Jack Abraham';
--select e.id empid, e.ename empname, m.id mgrid, m.ename mgrname, make from employee e inner join employee m on e.manager = m.id inner join computer c on e.compid = c.compid
--select s1.sid, s1.sname, s1.location from salesman s1 cross join salesman s2 where s1.location = s2.location and s1.sid <> s2.sid
--select s1.sid, s1.sname, sum((p.price * s2.quantity)) tamount, sum((p.price * s2.quantity * p.discount * 0.01)) tdiscount from salesman s1 inner join sale s3 on s1.sid = s3.sid inner join saledetail s2 on s3.saleid = s2.saleid inner join product p on s2.prodid = p.prodid group by s1.sid, s1.sname order by 3 desc, 4 desc
--select q.sname, q.quotationid from quotation q inner join orders o on q.quotationid = o.quotationid where abs(o.orderdate - o.delivereddate) <= 5
#For optimally, use filter condition Combining with the join condition using AND operator.
#LEFT OUTER JOIN: However care must be taken that the filter condition is using an attribute from the main table. Any attempt to filter (except check for NULL) using attribute from lookup table will result in wrong output as all NULL rows will get filtered and the purpose of using OUTER join will get defeated.
If we want to conditionally fetch values from the lookup table then the additional criteria must be combined with the join condition using AND operator.
#FULL OUTER JOIN: For two tables with p and q rows, a 1:1 relationship and m matched rows the total number of rows in the resultset is m + (p - m) + (q - m) = p + q - m.
ALTERNATE SYNTAX:
CROSS JOIN: SELECT E.ID, E.ENAME, E.COMPID AS E_COMPID, C.COMPID, C.Model FROM Employee E, Computer C
INNER JOIN: SELECT ID, ENAME, E.COMPID AS E_COMPID, C.COMPID AS C_COMPID, MODEL FROM Employee E, Computer C WHERE E.COMPID = C.COMPID
LEFT OUTER JOIN: SELECT ID, ENAME, E.COMPID AS E_COMPID, C.COMPID AS C_COMPID, MODEL FROM Employee E, Computer C WHERE E.COMPID = C.COMPID (+)
RIGHT OUTER JOIN: SELECT ID, ENAME, E.COMPID AS E_COMPID, C.COMPID AS C_COMPID, MODEL FROM Employee E, Computer C WHERE E.COMPID (+) = C.COMPID
SUBQUERY
--SELECT EName,Dept FROM Employee WHERE Salary>(SELECT AVG(Salary) FROM Employee);
--Interesting Question - http://ilp/fp2/Generic/DBMS/page/420
SELECT ename FROM emp WHERE ename = (SELECT MIN(ename) FROM emp);
--SELECT DEPT FROM Employee GROUP BY DEPT HAVING SUM(Salary) = (SELECT MAX(SUM(Salary)) FROM Employee GROUP BY Dept);
--SELECT CompId, Make, Model FROM Computer C WHERE EXISTS (SELECT 1 FROM Employee E WHERE E.CompId = C.CompId);
--SELECT CompId, Make, Model FROM Computer C WHERE NOT EXISTS (SELECT 1 FROM Employee E WHERE E.CompId = C.CompId);
#A subquery must be enclosed in brackets and can be used in SELECT, FROM, WHERE and HAVING clauses.
Subquery in SELECT and FROM clause are rarely used. Subqueries in WHERE and HAVING clauses are classified into ""Independent and Correlated subqueries"".
In an independent subquery, the inner and outer query are independent (the inner query has no reference to the outer query) of each other.
Subquery can be used in -
1] Field names in the SELECT statement
2] The FROM clause in the SELECT statement
3] The HAVING clause in the SELECT statement
4] The WHERE clause in SELECT as well as all DML statements
#A Correlated subquery is one in which the inner query that depends upon the outer query for it's execution. Specifically it uses a column from one of the tables in the outer query. The inner query is executed iteratively for each selected row of the outer query. In case of independent subquery, the inner query just executes once.
eg - SELECT EName, Dept, Salary FROM Employee E1 WHERE Salary > (SELECT AVG(Salary) FROM Employee E2 WHERE E2.Dept = E1.Dept)
#EXISTS keyword is used to check presence of rows in the subquery. The main query returns the row only if at least one row exists in the subquery. EXISTS clause follows short circuit logic i.e. the query calculation is terminated as soon as criteria is met. As a result it is generally faster than equivalent join statements.
#NOT EXISTS is opposite of EXISTS i.e. it is used to check absence of rows in the subquery. The main query returns the row only if at least no row exists in the subquery. It also uses short circuit logic and is hence faster.
#A subquery can appear on either side of a comparison operator
#Oracle database poses no limit on the number of subquery levels in the from clause of the top level query.
you can nest upto 255 levels of subqueries in the where clause.
#Inner queries in WHERE clause can not contain ORDER BY clause.
TRANSACTIONS
#Database provides three statements for transactions
1] SET Transaction
2] COMMIT
3] ROLLBACK
#AUTOCOMMIT property:
1] SET AUTOCOMMIT ON: Changes the mode of connection to ON. In this mode COMMIT command is automatically issued after every SQL statement that alters the state of a database.
2] SET AUTOCOMMIT OFF: Changes the mode of connection to OFF. In this mode user is expected to provide an explicit COMMIT or ROLLBACK command to complete the transaction.
3] SHOW AUTOCOMMIT
set aautocommit on will commit previous statement only if there is no commit/rollback in the next statement.
#All transactions exhibit ACID properties:
1] Acidity
2] Consistency
3] Isolation : Transactions execute in isolation of each other. In other words partial execution of one transaction is not visible to other transactions. Only committed data is visible to other transactions.
4] Durability : Once a transaction is committed, it is permanently saved, the data is preserved even in the case of power failure, hardware failure etc.
SELECT, DESCRIBE does not change the state of database.
NORMALIZATION
#An attribute A is said to functionally determine attribute B if each value of A is associated with only one value of B. A is called the Determinant while B is called the Dependent.
#Three types of FD:
1] Full FD
2] Partial FD
3] Transitive FD
#Normalization is the process of reorganizing data in a database so that data redundancy is reduced.
#2NF -
We decompose the relation by moving all the attributes that are dependent upon the partial candidate key (StudentId) into a new relation called Student. The partial candidate key (StudentId) becomes the primary key of this relation.
#
Online Transaction Processing
vs
Online Analytical Processing
Data
Dynamic (day to day transactional/Operational data)
Static (historical data)
Purpose of data
To control and run fundamental business task
Assist in planning, problem solving and decision making
Data Atomicity
Data is stored at the microscopic level
Data is aggregated or summarized and stored at a higher level
Normalization
Normalized databases to facilitate insertion, update, and deletion
De-normalized databases to facilitate queries and analysis
Operations
Simple operations that use small amounts of data
Complex queries using large amounts of data
Updates
Updates are frequent
Updates are infrequent
Response time
Fast response time is important.
Transactions are slow. Queries consume a lot of bandwidth
Data must be up-to-date, consistent at all times
Joins queries
Joins are more and complex as tables are normalized
Joins are few and simple as tables are de-normalized
#Index is a database object that allows us to search data in tables quickly and efficiently.
Indexes are also used to enforce data integrity. An index is automatically created for constraint enforcement when you create Primary Key and Unique constraint.
SELECT INDEX_NAME, COLUMN_NAME FROM USER_IND_COLUMNS WHERE TABLE_NAME = 'EMPLOYEE';
QUICK FACTS:
->Index for PRIMARY KEY and UNIQUE constraints are automatically created and dropped during table creation and deletion respectively.
->All other indexes have to be explicitly maintained by DBAs by issuing CREATE INDEX, ALTER INDEX or DROP INDEX commands. These indexes facilitate faster search and are created on columns that are used in search criteria i.e. the WHERE clause.
->Index is maintained automatically by the database server whenever data gets added, deleted or modified in the database.
->Size of index is directly proportional to size of data in primary table.
->Index can be created on multiple attributes as well. These are called composite index.
->Each table can have only one clustered index where data in the primary table is physically sorted in that order. Usually this is created on the primary key columns.
->There is no limit of number of non-clustered indexes that can be created on a database table.
->It increases the performance of SELECT statement but, decrease the performance of other query type.
->Indexes should be created on attributes used in the "Where" clause.
->A table can have only one clustered index and any number of non clustered indexes
#Data is retrieved from tables using Table Scan, Index Unique Scan or Index Range Scan. While the first process does not use an index at all, the other two leverage index created in database. Index is used only if sort order of data in the index can be leveraged for faster search.
#A full table scan is selected when:
--A large portion of the rows in the table must be accessed
--Index does not exist on the column being searched
--Index exist but cannot be used
#Index Unique Scan: It always retrieves one row from the database.
It is used when there is an equality predicate on a unique index or an index created as a result of a primary key constraint.
#Index Range Scan: It is used when SQL statement has an equality or range predicate (>, <, between) on a non-unique index key, or a range predicate on a unique index key. data is returned in the ascending order of index columns.
#Indexes cannot be used for negative criteria. Imagine you being asked to find all topics other than Union using the index of database book. A more sensible option is to do a table scan i.e. go page by page and omit any topic if it has UNION. E.g. - SELECT * FROM Employee WHERE Id <> 5
#Index cannot be used if expression is used with a column even if that column has an index present. This is because index has original values from table while here database has to look up calculated values which will not be present in the index. A table scan is performed.
#Index will be used if there is a partial match on a composite index. However the match must happen from the first column otherwise sorting cannot be leveraged.
#Index cannot be used if functions are used with a column even if that column has an index present. This is because index has original values from table while here database has to look up calculated values which will not be present in the index. A table scan is performed. You could create index on functions but for that the function must be deterministic.
#Index cannot be used if wildcard in LIKE operator is the first character.
#IF there is multiple index search then the database determines which one to choose based on which one results in faster filtering. This determination is made using database statistics.
#BEST PRACTICE:
Do not use Select ‘*’, instead fetch only attributes that are required. It improves code maintainability as clients invoking this query will not get impacted if new columns are added to the table in future.
BIG DATA
#Defining characteristics of data are:
1] Variety
2] Volume
3] Velocity
All the data put together is BIG DATA.
More than 85% of the world's data is unstructured (Facebook, Twitter, YouTube, NetFlix, Flickr etc.) and categorized as big data.
#NoSQL servers usually run on low cost hardware which can fail anytime. Therefore, the data is replicated and stored on multiple servers.
NoSQL databases are Partition Tolerant
#All databases should ideally provide the following features (CAP):
1] Consistency – An end user must be able to see the latest data at all times.
2] Availability – Every database request must be responded by the server.
3] Partition Tolerance – When two systems cannot talk to each other in a network, it is called network partition. Our DB system should continue to function even if there is a network partition.
#Eric Brewer’s CAP Theorem states that it is impossible for a distributed system to guarantee all three (Consistency, Availability and Partition Tolerance).
NoSQL databases are partition tolerant. In case of a network partition, there is a tradeoff between consistency and availability – some NoSQL systems give importance to Consistency while others give importance to Availability.
#NoSQL databases guarantee that eventually all replicas will be identical. This is called eventual consistency.
In contrast to NoSQL’s eventual consistency, SQL databases always provide consistent data. This is called strong consistency.
#NoSQL databases store data in four ways.
1] Key-Value Store (e.g. Riak, Redis, MemcacheDB)
2] Column Family Store (e.g. HBase, Cassandra)
3] Document-Oriented Database (e.g. MongoDB, CouchDB)
4] Graph Database (e.g. Neo4J, HyperGraphDB, InfoGrid)
#Operations on a Key-Value store
1] Get(key)
2] Put(key, value)
3] Multi-get(key1, key2, .., keyN)
4] Delete(key)
#Column Family Store characteristics:
1] Column Family databases can handle hundreds of terabytes of data easily.
2] Updates are performed without reading the row that contains it. Hence writes are very quick.
3] Column Family databases support real-time insertion of huge amount of data e.g. one million writes per second.
e.g. -
-Real-time weather data (min. temperature, max. temperature, air pressure etc.) collected through sensors at multiple locations.
-Log files of web servers for data analysis.
#Document Oriented databases have following characteristics: (MongoDB - a popular document-oriented database, open source)
1] Supports flexible schema, i.e. each document can have different attributes.
2] Provides a rich query language for storing, fetching, modifying and deleting data. This allows search and filter by any attribute value.
3] Ensures rapid application development e.g. In a fast changing retail store scenario, where products and their attributes are frequently changing.
4] Are suitable for storing data that does not require frequent updates, but is read many times.
#Graph Database characteristics:
1] Graph databases are suited for data that are heavily interconnected through relationships.
2] Graphs do not need joins for querying.
3] Graph databases use graph theory for traversal. It improves performance by keeping track of and thereby skipping nodes already visited.
4] Graph databases provide Atomicity, Consistency, Isolation and Durability (similar to SQL).
Cypher Query Language provided by Neo4j is used to query Graph Database:
e.g. - CREATE (u: Users { userid : 1, username : "Priya" });
MATCH (u: Users) WHERE u.username = "Priya" RETURN u;
Different NoSQL databases and relational databases complement each other as they are suitable for different scenarios. Increasingly applications are being designed to use multiple types of databases leveraging each for their strength in specific areas. This is called "polyglot persistence" (Using different databases for different purposes in a same project).
#MongoDB
Document Oriented
Cassandra
Column Family
Neo4J
Graph Database
Oracle
RDBMS
Riak
Key Value Store
#MongoDB:
->CRUD operations are performed through insert, find, update and remove operations on Collection object, NOT NULL, UNIQUE, FOREIGN KEY and CHECK constraints are not supported
->Joins and Subquery are not supported
->INSERT : db.emp.insert([ {_id: 2, ename: "Ethan", designation: "Manager", salary: 90000},
{_id: 3, ename: "Emily", designation: "Analyst", salary: 25000} ]);
->NOTE: If _id field is not specified then it is automatically generated with a unique value.
->Retrive data:
db.emp.find();
db.emp.find({designation: "Analyst"});
db.emp.find({}, {ename: 1, salary: 1});
db.emp.find({}, {ename: 1, salary: 1, _id: 0}); #In order not to display _id field
db.emp.find({salary: {$in: [30000, 90000]}});
db.emp.find({$or: [{designation:"Analyst"}, {ename:"Ethan"}]}, {ename: 1, salary: 1});
->Update:
db.emp.update({_id: 1}, {$set: {salary: 35000}});
db.emp.update({_id: 2}, {$set: {dept: "HR"}}); #Adding new Fields
db.emp.update({_id: 4}, {$set: {designation: "Analyst", dept: "ETA"}});
db.emp.update({}, {$set: {salary: 10000}}, {multi: true});
db.emp.update({_id: 3}, {ename: "Mark", designation: "Manager"});
->Delete:
db.emp.remove({_id: 4});
db.emp.remove({salary: {$lt: 30000}});
db.emp.remove({});
#Which of the following pairs of queries are equivalent under all circumstances?
1]SELECT DeptNo FROM Emp GROUP BY DeptNo;
SELECT DISTINCT DeptNo FROM Emp;
2]SELECT D.* FROM Emp E, Dept D WHERE D.DeptNo = E.DeptNo
SELECT * FROM Dept WHERE DeptNo IN (SELECT DeptNo from Emp)
3]SELECT AVG(DeptNo) As AVG FROM Emp;
SELECT SUM(DeptNo) / COUNT(*) AS AVG FROM Emp;
4]SELECT COUNT(*) - COUNT(DeptNo) FROM Emp;
SELECT COUNT(CASE WHEN DeptNo IS NULL THEN 0 ELSE NULL END) FROM Emp;
ANS - 1,4