The Wizard Coder

Tuesday, 7 August 2018

DBMS NOTES

Software Engineering

#software development activities required a systematic and disciplined approach i.e. Engineering Approach.

#Quality is not a feature which can be added to software after creating it! Quality must be built into software from the first day of development.

#SDLC models ---
Waterfall model
Incremental/Iterative model
Spiral model

#Requirement analysis
Activities:

Define the scope of the work
Understand processes
Focus on what needs to be automated
Understand functional and non-functional requirements
Document and Verify Requirements

Deliverables:

Software Requirement Specification(SRS)
Acceptance Test Plan
System Test Plan

#Functional Requirements:

Number of passengers booked per transaction should be max 6
SMS confirmation has to be sent to the registered mobile number on completing a successful transaction

Non Functional Requirements:

Time taken to complete a transaction should not be more than 3 minutes
1000 users should be able to use the application simultaneously

#waterfall model - design
Activities:

Create blueprint of the software
Define structure and behaviour of modules

Deliverables:

High Level Design
Detailed Design
Integration Test Plan
Unit Test Plan

#Coding
Activities:

Build the software
Unit test

Deliverables:

Unit tested code

#Configuration Management is the task of tracking and controlling changes in the software. It is achieved through version control tools like git, svn etc.
Configuration Management ensures:

We are using the right version
Deliverables are consistent with each other

Activity -
Version COntrol
Simultaneous USage
Change COntrol
Backup

# A class without any instance variable also follows Abstraction design principle

#Testing -
Activities:

Ensure that requirements are met

Deliverables:

Integrated and System tested software

#
Unit testing = developers
Integration testing = Development team and testing team
System Testing = Test team
Acceptance Testing = Customer

#We write "Unit A Driver" to test "Unit B" and "Unit C"
We write "Unit B stub" to test "Unit A"

#Regression testing =
While doing changes programmers might end up introducing some new defects in the software. Testing and fixing such defects is known as Regression testing.
In the maintenance phase the product must be tested against previous test cases.

#Performance Testing =
1] Load testing == Maximum operating capacity
2] Endurance testing == sustain continuous expected load
3] Stress Testing == determine a breaking point of system (stability) (expected situation)
4] Spike testing == ability to handle dramatic changes of load (unexpected situation)

#Deployment -
Activities:

Assemble
Install
End-user Training and
Sign-off

Deliverables:

Implemented Software
User manual

#Waterfall model is not suitable for:

Small projects
Complex projects
Accommodating changes (Correct Answer)
Maintenance Projects

#TOp down testing uses Stubs

#Agile software development is a group of software development methods in which requirements and solutions evolve through collaboration between self-organizing, cross-functional teams. It promotes adaptive planning, evolutionary development, early delivery, continuous improvement, and encourages rapid and flexible response to change.

#Sacrum Cycle ==

Product Backlog: The product backlog comprises an ordered list of requirements that a scrum team maintains for a product. It consists of features, bug fixes, functional, non functional requirements, change requests, enhancements etc.

Sprint Backlog: The sprint backlog is the list of tasks the development team must address during the upcoming sprint.

Sprint: A sprint (or iteration) is the basic unit of development in scrum. The sprint is a timeboxed effort; that is, it is restricted to a specific duration.

Working increment of software: The increment (or potentially shippable increment, PSI) is the sum of all the product backlog items completed during a sprint and all previous sprints.

#Sprint is time boxed development period in Scrum.
Each sprint starts with a sprint planning: event that identify the work for the sprint, and make an estimated commitment for the sprint goal.
Each sprint produces the planned increment of the product: outcome could be a working code.
Inspect: the client performs the acceptance testing during inspection and provides feedback.
Adapt: the team incorporates the feedback provided by the client to improve the work during subsequent sprint.
CYCLE = PLAN > PRODUCE > INSPECT > ADAPT

#Agile adoption results in frequent delivery of features to end users. Thus testing has to be performed more often. It also increases the probability of defects through manual errors. Most often there is no time to manually test the entire application as releases could often be only weeks apart.

Test automation is a key strategy to achieve faster delivery cycles. Agile teams use tools like xUnit, QTP, Selenium etc. to automate Unit testing, Functional testing, Regression testing etc. Test cases are written as code using these frameworks and can be executed at every build to detect bugs.

Another practice usually employed in agile projects is Continuous Integration. It essentially means that developers integrate their code to a common region frequently (often daily or multiple times a day). It's main aim is to detect and fix integration problems at an early stage.

#
Traditional Methods Agile Methods
Predictive Adaptive
Requirement changes are resisted Changes are welcome
Limited Interaction with Business/ Product and Testing Collaboration between various teams
Single Delivery Multiple Short / Small Deliveries
Process and Documentation Focus Communication Focus

#Advantages of Agile ==
1] Time to Market: Early and regular releases
2] Revenue: Early Return On Investment (ROI)
3] High Quality and Productivity: Testing is integrated throughout the cycle
4] Business Engagement/Customer Satisfaction: Active involvement of a business owner and the high visibility of the product and progress
5] Motivated Teams: Active involvement & collaboration make Agile development more enjoyable

#Which is more important, the Product or the Process?
--Process decides the Quality of the Product. Therefore process is important.

#Quality Control

Focus is on the Product
QC measures a product against the existence of a required attribute
Major QC activity is identifying defects and correcting them (Rework)
Inspections, Reviews and Testing activities

Quality Assurance

Focus is on the Process rather than Product
QA ensures "Fitness for Purpose"
Auditing and Reporting functions
Building process guidelines, checklists, templates, Training activities

#Cost of Quality (CoQ) is a measure of costs (effort) incurred in ensuring quality. These costs include:

Prevention cost such as training, process improvement etc.
Appraisal cost such as reviews, testing etc.
Correction/Rework/Failure cost such as rework to fix defects after testing etc.
CoQ is typically measured as percentage of effort spent to ensure quality compared to total project effort. Thus CoQ of 31.38% means that compared to overall effort, the team has spent 31.38% of effort to ensure quality.

CoQ = ((Prevention cost + Appraisal cost + Failure cost) x 100) / Total Project effort
Appraisal Cost, Failure Cost, Prevention Cost and Total Project Effort are all measured in person months.

#Ensures all requirements are implemented in the software. Requirement Traceability Matrix (RTM) maintains reference between: (Eg - page 60)

Requirement Number
Requirement Description
HLD Reference and DD Reference
Unit Test Plan Reference, Integration Test Plan Reference
System Test Plan Reference
Horizontal traceability ensures all requirements are incorporated into product.

Vertical traceability ensures no unnecessary functionality is included unless specifically called for by a requirement.

#defect classification -
1] logical
2] User Interface
3] Maintainability
4] Standards

#If a requirement defect slips from requirements stage unnoticed, and gets caught during later phases, it becomes much more costlier to fix them.

#Pareto principle states that, for many events, roughly 80% of the effects come from 20% of the causes. In software development context 80% of the defects are due to 20% of the causes.

#Basic MAtrics at Infosys:
1] Efforts
2] Defects
3] Schedule
4] Size

#Other derived Metrics used at Infosys

Defect Injection Rate
Defect Removal Efficiency
Review Effectiveness
Turn Around Time

#Process Infrastructure:
1] PRIDE
2] IPM+
3] ENCORE

DataBase

#Reference =
http://ilp/fp2/Generic/DBMS/page/78
http://ilp/fp2/Generic/DBMS/page/99

#
--CREATE TABLE Department(D_id number(3))
--SELECT * FROM user_tables
--SELECT * FROM TAB
--ALTER TABLE department ADD d_name varchar2(30)
--DROP TABLE department
--DESC department
--alter table Department add constraint Dept_UK Unique(Dept_Name)
--alter table Department rename constraint <old_name> to <new_name>
--SELECT EName, Salary * 2 AS Double_Salary FROM Employee
--SELECT EName, 30 AS Value FROM Employee
--SELECT DISTINCT Dept FROM Employee (NULL is also distinct)
--SELECT DISTINCT Dept, Manager FROM EMPLOYEE (for multiple columns distinct)
--SELECT ID, ENAME FROM Employee WHERE SALARY BETWEEN 30000 and 50000
--UPDATE Employee SET SALARY = SALARY * 1.3, BONUS = SALARY * 0.30 WHERE ID = 1
--TRUNCATE table Employee
--DELETE FROM Employee
--DELETE FROM Employee WHERE Id = 5
--select empid, salary AS "Current Salary", (salary * 1.1) AS "New Salary", (salary * 0.1) AS "Incremented Amount" from empdetails
--SELECT City, Country, CONCAT(City, Country) "CONCAT", City || Country "ConcatByOperator", CONCAT(CONCAT(City, ', '), Country) "NestedConcat" FROM Weather;
--SELECT RecordDate, SUBSTR(RecordDate,1,2) "DAY", SUBSTR(RecordDate,4,3) "MONTH", SUBSTR(RecordDate,8) "YEAR" FROM Weather;
--SELECT MinTemp, TO_CHAR(MinTemp) DEF_FORMAT, TO_CHAR(MinTemp, '999.99') "FIXED_DIGITS", TO_CHAR(MinTemp, '9,9.99') "COMMA" FROM Weather;
--SELECT '1000.98' "ORIG_NOFORMAT", TO_NUMBER('1000.98') "CONV_NOFORMAT", '1,000.98' "ORIG_FORMAT", TO_NUMBER('1,000.98', '9,999.99') "CONV_FORMAT" FROM DUAL;
--SELECT RecordDate, TO_CHAR(RecordDate, 'MON') "MONTH", TO_CHAR(RecordDate, 'Month') "FULL_MONTH", TO_CHAR(RecordDate, 'Dy') "DAY", TO_CHAR(RecordDate, 'Day') "FULL_DAY" FROM Weather;
--SELECT TO_CHAR(RecordDate) DEF_FORMAT, TO_CHAR(RecordDate, 'DD/MM/CCYY') INDIAN, TO_CHAR(RecordDate, 'MM/DD/YY') AMERICAN FROM Weather;
--SELECT '01-Jan-2014' DATE_STRING, TO_DATE('01-Jan-2014') CONV_NOFORMAT, TO_DATE('01-Jan-2014', 'DD-Mon-YYYY') CONV_FORMAT FROM DUAL
--SELECT 'Jan-01-2014' DATE_STRING, TO_DATE('Jan-01-2014', 'Mon-DD-YYYY') CONV_FORMAT FROM DUAL
--SELECT COUNT(Dept) Count1, COUNT(DISTINCT Dept) Count2 FROM Employee
--SELECT CITY, NVL(CITY, 'Not Available') NVL_CITY, MINTEMP, NVL(MINTEMP, 0.0) NVL_MINTEMP, NVL(TO_CHAR(MINTEMP), 'Not Available') NVL_MINTEMP2 from Weather;
--select count(*) sale_count from sale where add_months(sldate, 40) > sysdate #query to display the number of sales that were made in the last 40 months
--insert all
into Student values(20, 'Amit')
into Student values(21, 'Nik')
into Student values(22, 'Raj')
--SELECT Id, EName, Designation, Salary,
CASE Designation
WHEN 'SE' THEN Salary * 1.2
WHEN 'SSE' THEN Salary * 1.1
ELSE Salary * 1.05
END New_Salary
FROM Employee;
--SELECT EName, Designation, Bonus,
CASE
WHEN Designation = 'SE' THEN Bonus + 500
WHEN Designation = 'SSE' THEN Bonus + 1000
WHEN Designation = 'PM' THEN Bonus + 2000
ELSE Bonus
END AS NewBonus
FROM Employee;
--SELECT Id, EName, Designation, Salary,
CASE
WHEN Designation = 'SE' OR Designation = 'SSE' THEN TO_CHAR(Salary * 1.2)
WHEN Designation = 'PM' AND Salary >= 90000 THEN 'No hike'
ELSE TO_CHAR(Salary * 1.05 )
END New_Salary
FROM Employee;
--Select ID, ENAME, DOJ, SALARY, DEPT, DESIGNATION FROM Employee ORDER BY DEPT ASC, DESIGNATION DESC
--select to_char(sldate, 'Month') MONTH, count(*) NUmber_sale from sale group by to_char(sldate, 'Month') order by 2 desc
--select prodid, sum(quantity) qty_sold from saledetail where quantity > 1 group by prodid having count(*) > 1
--select distinct Itemtype,
case
when price between 0 and 499 then 'Cheap'
when price between 500 and 1999 then 'Affordable'
when price between 2000 and 4999 then 'Expensive'
when price >= 5000 then 'Very Expensive'
end as classification
from item
order by 1,2
--select to_char(to_date('Jan/10/2015', 'Mon/DD/YYYY'), 'Month') month, to_number('2,50,000.00', '9,99,999.99') amount from dual
--select pymtmode, count(*) pymtcount from orders where to_number(to_char(pymtdate, 'YY'),'99') < 15 group by pymtmode having count(*) > 1
--select to_char(sysdate, 'Mon/DD/YYYY Day') currentdate from dual
--select orderid, round(abs(months_between(orderdate, pymtdate)* 30)) noofdays from orders
--select count(distinct itemtype) noofitemtypes from item

#A file-based approach suffers following problems:
1] Dependency of program on physical structure of data
2] Complex process to retrieve data
3] Loss of data on concurrent access
4] Inability to give access based on record (Security)
5] Data redundancy

#Database is a shared collection of logically related data
Database Management System is a software system that enables users to define, create, maintain, and control access to the database.
Application Program interacts with a database by issuing an appropriate request

#DBMS offers -
Data Management, Integrity, Transaction, Concurrency, Security, Recovery and Utilities

#Types of DataBase System:
1] Hierarchical
2] Network
3] Relational
4] NoSQL

#Tables = Relations
Attributes = Columns = Fields
Rows = Records = Tuples
Number of Columns = Degree of Relation
Number of Records = Cardinality of Relation

#A relationship among a set of values in a table is represented by a "ROW"

#Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle.
Types of Integrity Constraints:
1] Entity Integrity (Primary Key) (between rows)
2] Domain Integrity (DATA TYPES, CHECK CONSTRAINT) (between columns)
3] Referential Integrity (Foreign Key) (between tables)

#When two or more columns together identify the unique row then it's referred to as Composite Primary Key.

#Terms of Foreign KEys:
Parent Table = Master table = Referenced table
Child table = Referencing table

#ER model is a graphical representation of entities and their relationships which helps in understanding data independent of the actual database implementation.
Entity = Real world Object
Attribute = Property of Entity

#Crow foot notation is one of the ways to represent cardinality of relationship in an ER Model.

#In 1 : N relationships, the foreign key and relationship attributes are always added to the many (N) side of the relationship.

#An attribute in a table/relation can reference another attribute in the same table/relation and this is called Self Referencing Foreign Key

#CHAR = CHARACTER
VARCHAR2 = CHARACTER VARYING

#SQL supports SMALLINT, INTEGER and INT data types that are used for storing whole numbers.
SQL, unlike programming languages, does not provide support for arbitrary length numbers i.e. numbers not bound by size limits. For e.g. Python supports "bignum" and Java supports "BigInteger" data types.

#Nonintegral data types have an integer part and a fractional part. Either NUMERIC, DECIMAL or NUMBER data types can be used to store nonintegral numbers.

#SQL supports following data types for representing date and large objects:
1] DATE (default format = DD-MON-YY )
2] TIMESTAMP (Storing date data with precision up-to 1 billionth (9 digits) of a second.)
3] CLOB (Character Large Object) (More than 4000 bytes data)
4] BLOB (Binary Large Object) (movies, images with size up to 4GB)

#Not equal to = <>

#Various constraints that can be created on database tables are:

NOT NULL
PRIMARY KEY
CHECK
UNIQUE
FOREIGN KEY

We can also specify DEFAULT value for a column. Oracle database does not consider DEFAULT as a constraint.

#CREATE TABLE Student (
StudentId INTEGER CONSTRAINT stud_sid_pk PRIMARY KEY,
FName VARCHAR2(10) CONSTRAINT Stud_Fname_nn NOT NULL,
LName VARCHAR2(10),
Gender CHAR(1) CONSTRAINT Stud_gender_ck1 CHECK(Gender IN('M', 'F')),
ContactNo NUMBER(10) CONSTRAINT Stud_cno_uk UNIQUE,
DOJ DATE DEFAULT SYSDATE);
--CONSTRAINT Stud_gender_ck1 CHECK(Gender IN('M', 'F')));

#Column level constraints :
CREATE TABLE Student (
StudentId INTEGER CONSTRAINT stud_sid_pk PRIMARY KEY,
FName VARCHAR2(10),
ContactNo NUMBER(10));

Table level constraints :
CREATE TABLE Student (
StudentId INTEGER,
FName VARCHAR2(10),
ContactNo NUMBER(10),
CONSTRAINT stud_sid_pk PRIMARY KEY (StudentId));

#NOT NULL can be defined only at the column level
Unlike Primary Key, UNIQUE constraint allows NULL values. A table can have many UNIQUE constraints.
Referential integrity constraint allows NULL value

#CREATE TABLE Marks(
CourseId INTEGER,
StudentId INTEGER CONSTRAINT marks_sid_fk REFERENCES Student(StudentId),
MarksScored DECIMAL(5,2));

CREATE TABLE Marks(
CourseId INTEGER,
StudentId INTEGER,
MarksScored DECIMAL(5,2),
CONSTRAINT marks_sid_fk FOREIGN KEY (StudentId) REFERENCES Student(StudentId));

#The data type of the column can be modified with the MODIFY clause. The size of the data type can be increased or decreased. The column should be empty for decreasing the size or for changing the data type from one type to another.

#The column dropped cannot be recovered back. At least one column should be present in the table after dropping the columns.

#Multiple rows can be inserted through a single INSERT statement only when it is used with SELECT statement.

#DISTINCT =
DISTINCT removes duplicates based on all the columns in the SELECT clause
Usage of DISTINCT should be avoided as far as possible due to performance issues

#Equal to operator cannot be used to check for NULL values (eg. - where bonus = NULL)
Trailing spaces are ignored for CHAR data type.
Leading spaces are not ignored for CHAR data type.
Trailing spaces are not ignored for VARCHAR2 data type.
Leading spaces are not ignored for VARCHAR2 data type.

#varchar will reserve space for null values while varchar2 will not reserve any space for null values.

#"Frank John's Wicked Grave Haunts Several Dull Owls" mnemonics for execution

#(date1 < date2) find answer? (past is smaller)

#Which of the following columns in a table cannot be updated?
A primary key column which also serves as foreign key reference in another table.

#TRUNCATE statement deletes all rows from the table as it does not support WHERE clause. TRUNCATE statement is a faster option compared to DELETE when you have to delete all rows from the table.

#Status Code:
ORA-00000 Successful Completion
ORA-00001 Unique constraint violation
ORA-00904 Invalid Identifier
ORA-0913 Too Many Values

#BETWEEN operator can be replaced by AND.
IN operator can be replaced by OR.

#There are two columns in every table - Rownum and Sysdate

#single row function produces one row of output for each row of input
multi row function produces just one row of output, irrespective of the number of rows in input

Numeric functions are single row functions that accept a numeric value and return numeric output. -
ABS, ROUND, FLOOR, CEIL
Character functions work on character strings and can return a character string or a numeric value. -
UPPER, LOWER, CONCAT, LENGTH
Substring function is used to extract part of a string. It has the following syntax
SUBSTR(value, start_position, length)
Use conversion functions to convert data from one format to another. -
TO_CHAR(value, format) ; TO_DATE (value, format) ; TO_NUMBER (value, format)
DATE Functions - SYSDATE, SYSTIMESTAMP, ADD_MONTHS(date, no_of_months), MONTHS_BETWEEN(date1, date2)

Aggregate functions operate on multiple rows to return a single row. Some aggregate functions like SUM (total), AVG (average) operates only on numeric columns while others like MIN (lowest value), MAX (highest value) and COUNT (number of rows) operate on all data types. All aggregate functions ignore NULL values except COUNT(*).
NVL(value1, value2) : Substitutes value1 by value2 if value1 is NULL. The data type of value1 and value2 must be same.
USER : Returns the current logged in user
NVL2(v1,v2,v3)
NULLIF(v1,v2)

#MIN, MAX, COUNT can be applied on both number and character typed columns.

#CASE Statement can also be used with WHERE, GROUP BY etc. CASE statement has two different syntax styles: Simple CASE expression and Searched CASE expression.
Simple CASE expression can be used when all conditions check for equality against a single column.
Searched CASE expression is used to compare different conditions.
NOTE: ELSE is optional. If ELSE is omitted and no condition is true, then NULL is returned by CASE.

#Column position in the query can be used as an alternative to column name in ORDER BY clause
Select ID, ENAME, DOJ, SALARY, DEPT, DESIGNATION FROM Employee ORDER BY 2

#Having allows aggregate functions to be used as filter criteria which cannot be done using WHERE clause.

#GROUP BY must always be after the WHERE clause otherwise aggregate functions will be calculated wrongly.
Having must always be after Group by as it filters records based on aggregate functions calculated during GROUP BY evaluation

#Nested aggregate function cannot be used in SELECT clause without GROUP BY clause.
Columns specified in SELECT clause must be part of the GROUP BY clause.
a GROUP BY clause is mandatory in SELECT query having one aggregate function along with other columns
A query that has a nested aggregate function must have a GROUP BY clause

#Use UNION and UNION ALL clause to combine results from two or more SELECT statements. The select statements may be from same or different tables.They must have same number of columns and their data types at same position in both the query must be compatible (either same or convertible through automatic conversion).

JOINS

--SELECT ID, ENAME, E.COMPID AS E_COMPID, C.COMPID AS C_COMPID, MODEL FROM Employee E INNER JOIN Computer C ON E.COMPID = C.COMPID
--SELECT E.Id, E.ENAME FROM Employee E INNER JOIN Employee J ON E.Dept = J.Dept AND E.Id <> J.Id AND J.ENAME='Jack Abraham';
--select e.id empid, e.ename empname, m.id mgrid, m.ename mgrname, make from employee e inner join employee m on e.manager = m.id inner join computer c on e.compid = c.compid
--select s1.sid, s1.sname, s1.location from salesman s1 cross join salesman s2 where s1.location = s2.location and s1.sid <> s2.sid
--select s1.sid, s1.sname, sum((p.price * s2.quantity)) tamount, sum((p.price * s2.quantity * p.discount * 0.01)) tdiscount from salesman s1 inner join sale s3 on s1.sid = s3.sid inner join saledetail s2 on s3.saleid = s2.saleid inner join product p on s2.prodid = p.prodid group by s1.sid, s1.sname order by 3 desc, 4 desc
--select q.sname, q.quotationid from quotation q inner join orders o on q.quotationid = o.quotationid where abs(o.orderdate - o.delivereddate) <= 5

#For optimally, use filter condition Combining with the join condition using AND operator.

#LEFT OUTER JOIN: However care must be taken that the filter condition is using an attribute from the main table. Any attempt to filter (except check for NULL) using attribute from lookup table will result in wrong output as all NULL rows will get filtered and the purpose of using OUTER join will get defeated.
If we want to conditionally fetch values from the lookup table then the additional criteria must be combined with the join condition using AND operator.

#FULL OUTER JOIN: For two tables with p and q rows, a 1:1 relationship and m matched rows the total number of rows in the resultset is m + (p - m) + (q - m) = p + q - m.

ALTERNATE SYNTAX:
CROSS JOIN: SELECT E.ID, E.ENAME, E.COMPID AS E_COMPID, C.COMPID, C.Model FROM Employee E, Computer C
INNER JOIN: SELECT ID, ENAME, E.COMPID AS E_COMPID, C.COMPID AS C_COMPID, MODEL FROM Employee E, Computer C WHERE E.COMPID = C.COMPID
LEFT OUTER JOIN: SELECT ID, ENAME, E.COMPID AS E_COMPID, C.COMPID AS C_COMPID, MODEL FROM Employee E, Computer C WHERE E.COMPID = C.COMPID (+)
RIGHT OUTER JOIN: SELECT ID, ENAME, E.COMPID AS E_COMPID, C.COMPID AS C_COMPID, MODEL FROM Employee E, Computer C WHERE E.COMPID (+) = C.COMPID

SUBQUERY

--SELECT EName,Dept FROM Employee WHERE Salary>(SELECT AVG(Salary) FROM Employee);
--Interesting Question - http://ilp/fp2/Generic/DBMS/page/420
SELECT ename FROM emp WHERE ename = (SELECT MIN(ename) FROM emp);
--SELECT DEPT FROM Employee GROUP BY DEPT HAVING SUM(Salary) = (SELECT MAX(SUM(Salary)) FROM Employee GROUP BY Dept);
--SELECT CompId, Make, Model FROM Computer C WHERE EXISTS (SELECT 1 FROM Employee E WHERE E.CompId = C.CompId);
--SELECT CompId, Make, Model FROM Computer C WHERE NOT EXISTS (SELECT 1 FROM Employee E WHERE E.CompId = C.CompId);

#A subquery must be enclosed in brackets and can be used in SELECT, FROM, WHERE and HAVING clauses.
Subquery in SELECT and FROM clause are rarely used. Subqueries in WHERE and HAVING clauses are classified into ""Independent and Correlated subqueries"".
In an independent subquery, the inner and outer query are independent (the inner query has no reference to the outer query) of each other.
Subquery can be used in -
1] Field names in the SELECT statement
2] The FROM clause in the SELECT statement
3] The HAVING clause in the SELECT statement
4] The WHERE clause in SELECT as well as all DML statements

#A Correlated subquery is one in which the inner query that depends upon the outer query for it's execution. Specifically it uses a column from one of the tables in the outer query. The inner query is executed iteratively for each selected row of the outer query. In case of independent subquery, the inner query just executes once.
eg - SELECT EName, Dept, Salary FROM Employee E1 WHERE Salary > (SELECT AVG(Salary) FROM Employee E2 WHERE E2.Dept = E1.Dept)

#EXISTS keyword is used to check presence of rows in the subquery. The main query returns the row only if at least one row exists in the subquery. EXISTS clause follows short circuit logic i.e. the query calculation is terminated as soon as criteria is met. As a result it is generally faster than equivalent join statements.

#NOT EXISTS is opposite of EXISTS i.e. it is used to check absence of rows in the subquery. The main query returns the row only if at least no row exists in the subquery. It also uses short circuit logic and is hence faster.

#A subquery can appear on either side of a comparison operator

#Oracle database poses no limit on the number of subquery levels in the from clause of the top level query.
you can nest upto 255 levels of subqueries in the where clause.

#Inner queries in WHERE clause can not contain ORDER BY clause.

TRANSACTIONS

#Database provides three statements for transactions
1] SET Transaction
2] COMMIT
3] ROLLBACK

#AUTOCOMMIT property:
1] SET AUTOCOMMIT ON: Changes the mode of connection to ON. In this mode COMMIT command is automatically issued after every SQL statement that alters the state of a database.
2] SET AUTOCOMMIT OFF: Changes the mode of connection to OFF. In this mode user is expected to provide an explicit COMMIT or ROLLBACK command to complete the transaction.
3] SHOW AUTOCOMMIT

set aautocommit on will commit previous statement only if there is no commit/rollback in the next statement.

#All transactions exhibit ACID properties:
1] Acidity
2] Consistency
3] Isolation : Transactions execute in isolation of each other. In other words partial execution of one transaction is not visible to other transactions. Only committed data is visible to other transactions.
4] Durability : Once a transaction is committed, it is permanently saved, the data is preserved even in the case of power failure, hardware failure etc.

SELECT, DESCRIBE does not change the state of database.

NORMALIZATION

#An attribute A is said to functionally determine attribute B if each value of A is associated with only one value of B. A is called the Determinant while B is called the Dependent.

#Three types of FD:
1] Full FD
2] Partial FD
3] Transitive FD

#Normalization is the process of reorganizing data in a database so that data redundancy is reduced.

#2NF -
We decompose the relation by moving all the attributes that are dependent upon the partial candidate key (StudentId) into a new relation called Student. The partial candidate key (StudentId) becomes the primary key of this relation.

# Online Transaction Processing vs Online Analytical Processing
Data Dynamic (day to day transactional/Operational data) Static (historical data)
Purpose of data To control and run fundamental business task Assist in planning, problem solving and decision making
Data Atomicity Data is stored at the microscopic level Data is aggregated or summarized and stored at a higher level
Normalization Normalized databases to facilitate insertion, update, and deletion De-normalized databases to facilitate queries and analysis
Operations Simple operations that use small amounts of data Complex queries using large amounts of data
Updates Updates are frequent Updates are infrequent
Response time Fast response time is important. Transactions are slow. Queries consume a lot of bandwidth
Data must be up-to-date, consistent at all times
Joins queries Joins are more and complex as tables are normalized Joins are few and simple as tables are de-normalized

#Index is a database object that allows us to search data in tables quickly and efficiently.
Indexes are also used to enforce data integrity. An index is automatically created for constraint enforcement when you create Primary Key and Unique constraint.

SELECT INDEX_NAME, COLUMN_NAME FROM USER_IND_COLUMNS WHERE TABLE_NAME = 'EMPLOYEE';

QUICK FACTS:
->Index for PRIMARY KEY and UNIQUE constraints are automatically created and dropped during table creation and deletion respectively.
->All other indexes have to be explicitly maintained by DBAs by issuing CREATE INDEX, ALTER INDEX or DROP INDEX commands. These indexes facilitate faster search and are created on columns that are used in search criteria i.e. the WHERE clause.
->Index is maintained automatically by the database server whenever data gets added, deleted or modified in the database.
->Size of index is directly proportional to size of data in primary table.
->Index can be created on multiple attributes as well. These are called composite index.
->Each table can have only one clustered index where data in the primary table is physically sorted in that order. Usually this is created on the primary key columns.
->There is no limit of number of non-clustered indexes that can be created on a database table.
->It increases the performance of SELECT statement but, decrease the performance of other query type.
->Indexes should be created on attributes used in the "Where" clause.
->A table can have only one clustered index and any number of non clustered indexes

#Data is retrieved from tables using Table Scan, Index Unique Scan or Index Range Scan. While the first process does not use an index at all, the other two leverage index created in database. Index is used only if sort order of data in the index can be leveraged for faster search.

#A full table scan is selected when:
--A large portion of the rows in the table must be accessed
--Index does not exist on the column being searched
--Index exist but cannot be used

#Index Unique Scan: It always retrieves one row from the database.
It is used when there is an equality predicate on a unique index or an index created as a result of a primary key constraint.

#Index Range Scan: It is used when SQL statement has an equality or range predicate (>, <, between) on a non-unique index key, or a range predicate on a unique index key. data is returned in the ascending order of index columns.

#Indexes cannot be used for negative criteria. Imagine you being asked to find all topics other than Union using the index of database book. A more sensible option is to do a table scan i.e. go page by page and omit any topic if it has UNION. E.g. - SELECT * FROM Employee WHERE Id <> 5

#Index cannot be used if expression is used with a column even if that column has an index present. This is because index has original values from table while here database has to look up calculated values which will not be present in the index. A table scan is performed.

#Index will be used if there is a partial match on a composite index. However the match must happen from the first column otherwise sorting cannot be leveraged.

#Index cannot be used if functions are used with a column even if that column has an index present. This is because index has original values from table while here database has to look up calculated values which will not be present in the index. A table scan is performed. You could create index on functions but for that the function must be deterministic.

#Index cannot be used if wildcard in LIKE operator is the first character.

#IF there is multiple index search then the database determines which one to choose based on which one results in faster filtering. This determination is made using database statistics.

#BEST PRACTICE:
Do not use Select ‘*’, instead fetch only attributes that are required. It improves code maintainability as clients invoking this query will not get impacted if new columns are added to the table in future.

BIG DATA

#Defining characteristics of data are:
1] Variety
2] Volume
3] Velocity

All the data put together is BIG DATA.
More than 85% of the world's data is unstructured (Facebook, Twitter, YouTube, NetFlix, Flickr etc.) and categorized as big data.

#NoSQL servers usually run on low cost hardware which can fail anytime. Therefore, the data is replicated and stored on multiple servers.
NoSQL databases are Partition Tolerant

#All databases should ideally provide the following features (CAP):
1] Consistency – An end user must be able to see the latest data at all times.
2] Availability – Every database request must be responded by the server.
3] Partition Tolerance – When two systems cannot talk to each other in a network, it is called network partition. Our DB system should continue to function even if there is a network partition.

#Eric Brewer’s CAP Theorem states that it is impossible for a distributed system to guarantee all three (Consistency, Availability and Partition Tolerance).
NoSQL databases are partition tolerant. In case of a network partition, there is a tradeoff between consistency and availability – some NoSQL systems give importance to Consistency while others give importance to Availability.

#NoSQL databases guarantee that eventually all replicas will be identical. This is called eventual consistency.
In contrast to NoSQL’s eventual consistency, SQL databases always provide consistent data. This is called strong consistency.

#NoSQL databases store data in four ways.
1] Key-Value Store (e.g. Riak, Redis, MemcacheDB)
2] Column Family Store (e.g. HBase, Cassandra)
3] Document-Oriented Database (e.g. MongoDB, CouchDB)
4] Graph Database (e.g. Neo4J, HyperGraphDB, InfoGrid)

#Operations on a Key-Value store
1] Get(key)
2] Put(key, value)
3] Multi-get(key1, key2, .., keyN)
4] Delete(key)

#Column Family Store characteristics:
1] Column Family databases can handle hundreds of terabytes of data easily.
2] Updates are performed without reading the row that contains it. Hence writes are very quick.
3] Column Family databases support real-time insertion of huge amount of data e.g. one million writes per second.
e.g. -
-Real-time weather data (min. temperature, max. temperature, air pressure etc.) collected through sensors at multiple locations.
-Log files of web servers for data analysis.

#Document Oriented databases have following characteristics: (MongoDB - a popular document-oriented database, open source)
1] Supports flexible schema, i.e. each document can have different attributes.
2] Provides a rich query language for storing, fetching, modifying and deleting data. This allows search and filter by any attribute value.
3] Ensures rapid application development e.g. In a fast changing retail store scenario, where products and their attributes are frequently changing.
4] Are suitable for storing data that does not require frequent updates, but is read many times.

#Graph Database characteristics:
1] Graph databases are suited for data that are heavily interconnected through relationships.
2] Graphs do not need joins for querying.
3] Graph databases use graph theory for traversal. It improves performance by keeping track of and thereby skipping nodes already visited.
4] Graph databases provide Atomicity, Consistency, Isolation and Durability (similar to SQL).

Cypher Query Language provided by Neo4j is used to query Graph Database:
e.g. - CREATE (u: Users { userid : 1, username : "Priya" });
MATCH (u: Users) WHERE u.username = "Priya" RETURN u;

Different NoSQL databases and relational databases complement each other as they are suitable for different scenarios. Increasingly applications are being designed to use multiple types of databases leveraging each for their strength in specific areas. This is called "polyglot persistence" (Using different databases for different purposes in a same project).

#MongoDB Document Oriented
Cassandra Column Family
Neo4J Graph Database
Oracle RDBMS
Riak Key Value Store

#MongoDB:
->CRUD operations are performed through insert, find, update and remove operations on Collection object, NOT NULL, UNIQUE, FOREIGN KEY and CHECK constraints are not supported
->Joins and Subquery are not supported
->INSERT : db.emp.insert([ {_id: 2, ename: "Ethan", designation: "Manager", salary: 90000},
{_id: 3, ename: "Emily", designation: "Analyst", salary: 25000} ]);
->NOTE: If _id field is not specified then it is automatically generated with a unique value.

->Retrive data:
db.emp.find();
db.emp.find({designation: "Analyst"});
db.emp.find({}, {ename: 1, salary: 1});
db.emp.find({}, {ename: 1, salary: 1, _id: 0}); #In order not to display _id field
db.emp.find({salary: {$in: [30000, 90000]}});
db.emp.find({$or: [{designation:"Analyst"}, {ename:"Ethan"}]}, {ename: 1, salary: 1});

->Update:
db.emp.update({_id: 1}, {$set: {salary: 35000}});
db.emp.update({_id: 2}, {$set: {dept: "HR"}}); #Adding new Fields
db.emp.update({_id: 4}, {$set: {designation: "Analyst", dept: "ETA"}});
db.emp.update({}, {$set: {salary: 10000}}, {multi: true});
db.emp.update({_id: 3}, {ename: "Mark", designation: "Manager"});

->Delete:
db.emp.remove({_id: 4});
db.emp.remove({salary: {$lt: 30000}});
db.emp.remove({});

#Which of the following pairs of queries are equivalent under all circumstances?
1]SELECT DeptNo FROM Emp GROUP BY DeptNo;
SELECT DISTINCT DeptNo FROM Emp;

2]SELECT D.* FROM Emp E, Dept D WHERE D.DeptNo = E.DeptNo
SELECT * FROM Dept WHERE DeptNo IN (SELECT DeptNo from Emp)

3]SELECT AVG(DeptNo) As AVG FROM Emp;
SELECT SUM(DeptNo) / COUNT(*) AS AVG FROM Emp;

4]SELECT COUNT(*) - COUNT(DeptNo) FROM Emp;
SELECT COUNT(CASE WHEN DeptNo IS NULL THEN 0 ELSE NULL END) FROM Emp;

ANS - 1,4

Monday, 27 November 2017

Why most people are RIGHT -HANDED?

Many animals show a preference for one side of the body over another, but the split between right- and left-handed varies.

Seven out of 10 chimpanzees are right-handed.
Almost all kangaroos are left-handed.
In cats, males are nearly all left-handed and females are nearly all right-handed.

Humans have a higher proportion of right-handers than any species, with left-handers making up just 10 per cent of the population.

The perfect reason is, "we are a tool-using species", and also highly social. We make use of things that are build by others. So, to match the hand pattern, people use same handed versions.
The very earliest flint tools, around two million years ago, don’t show a strong bias towards left- or right-handed versions.
But it’s a big advantage if you can use the tools someone else has made, and, from about 1.5 million years ago, we seem to have standardised on the right handed versions. It’s not exactly clear why right-handedness won, but it may be that one side of our brain was already specialised for fine-motor control. One theory why left-handedness hasn’t been completely eliminated is that it provides an advantage in combat, precisely because it is rarer, and, therefore, unexpected. You can see this today in sports like tennis, where left-handed professionals are more common than in the general population.

Could JUPITER Become a Star?

Jupiter is often called a ‘failed star’ because, although it is mostly comprise of hydrogen like most normal stars, it is not gigantic enough to commence "thermo nuclear reactions" in its core and thus become a ‘real star’.

But the term ‘failed star’ is a bit of a misnomer. According to Theory, any object at all could be made into a star, simply by adding enough matter to it. With enough mass, the internal pressure and temperature of the object will reach the threshold needed to start thermo nuclear reactions. That threshold is the least for the simplest element, hydrogen. In order to turn Jupiter into a star like the Sun, for example, you would have to add about 1,000 times the mass of Jupiter.

But, to make a cooler ‘red dwarf’, you would only need to add about 80 Jupiter masses. Although the exact numbers are still a bit uncertain, it is possible that a ‘brown dwarf’ could still form (in which deuterium, rather than hydrogen, fuses in the star’s core) with only about 13 Jupiter masses.

So, Jupiter cannot and will not spontaneously become a star, but, if a minimum of 13 extra Jupiter-mass objects happen to collide with it, there is a chance it will.

The Magic of Fibonacci numbers

Let us look in the real world to find the traces of Fibonacci. In every bee hive, there is one female bee (queen) who lays all the eggs. If an egg is not fertilized, it ultimately hatches into a male bee, called a drone.

If an egg is fertilized by a male bee, then the egg produces a female bee (i.e, worker bee), who doesn't lay any eggs herself (unless she is fed something called royal jelly, in that case she will turn into a queen to challenge the existing one or fly off and start a new hive).

Now let's look at the family tree of a male. If we will count the numbers of parent, grand parent, great grandparents, great-great grandparents, great-great-great grandparents, great-great-great-great grandparents and so on of a male bee, we will find the Fibonacci sequence. 1, 1, 2, 3, 5, 8, 13...

These numbers are then used to derive the Golden Ratio. The Golden Ratio is 1.618. It appears naturally in almost all aspects of life i.e, family trees in bunnies, petals on a rose, body ratio’s in humans. It is said that people that are considered ‘naturally beautiful’ have a face and body that follows the golden ratio. Both the Fibonacci numbers and the Golden Ratio appear in Honeybees.
If you divide the number of female bees by the number of male bees you get 1.618, the golden ratio. This mathematical sequence work for any honeybee hive at any give time. Commonly, honeybee hives are always used to explain the Fibonacci Sequence and Golden Ratio.

Friday, 24 November 2017

Artificial Intelligence vs Intelligence : PILOT PROJECT

An expert human pilot was successfully able to beat flying drones controlled by artificial intelligence (AI) systems in a race organised by NASA.

ABOUT THE RACE

Jet Propulsion Laboratory (JPL), NASA researchers, raced AI-controlled drones against world-class pilot "Ken Loo". The team built three custom drones— dubbed Batman, Joker and Nightwing —and developed the complex algorithms the drones needed to fly at high speeds while avoiding obstacles.

WHAT RESEARCHERS FOUND

AI flies drones smoothly around acourse, whereas human pilots tend to accelerate aggressively, so their path is jerkier Compared to Loo, the drones flew more cautiously but consistently. Loo attained higher speeds and was able to perform impressive aerial feats. However, he was limited
by exhaustion, something that the AI-piloted drones did not have to deal with.
On the other hand, the drones sometimes moved so fast that motion blur caused them to lose track of their surroundings, said researchers.
11.1 seconds was Loo’s average flight time, compared to the autonomous drones, which averaged 13.9 seconds.
“We pitted our algorithms against a human, who flies a lot more by feel". ROBREID JPL, the project’s task manager.

WHY THIS IS IMPORTANT

Camera-based localisation and mapping technologies have various potential applications. These technologies might allow drones to check on inventory in warehouses or assist search and rescue operations at disastersites. They might even be used eventually to help future robots navigate the corridors of a space station.

(Source: HT)

Saturday, 18 November 2017

Techniques to Nurture the Start-up

4 Simple Techniques to Hone the StartUp Trends

If you want to be your true self and become the authority in your field, consider these four steps.

1. Take a stand on your message and mission.

Don’t filter what you share based on how other people may view you. Not everyone is going to “like” you or what you have to say. You didn’t get into business to be liked by everyone, did you? The more you edit, the more you sit on the sidelines. The less you edit, the more you stand out and become an authority in your field.
You can’t be afraid to piss people off. Be the person that has the audacity to speak their mind. Be the person that has the balls to say what others won’t. These are the traits that will make you and your business stand out from the crowd and not get lost in translation.

2. Let other people be your ambassadors.

This means less talk, more action. Every single person you encounter needs to see who you really are. Be your true self, and your message will become real and authentic. If you are being fake and fluffy, anyone with half a brain can see directly through you. Everyone wants to find a voice they can relate to, and you can be the person everyone talks about.
Once people start talking about you and your business, word spreads fast. They can either say, “Wow this message is so fake. This is such B.S.” or they can say, “Wow this person is in my brain, they understand my pain and my plight so well. This is someone I can trust.” Which one would you prefer?
If you want people to talk about you in a positive manner, be yourself. Be real and keep your word no matter what.

3. Create trends.

Have original ideas. Be a leader, not a follower. Don’t try to fit in. In fact, do the opposite, because the whole point is to stand out, right? In order to create a trend, the idea cannot come from anywhere else except your brain.
Don’t be influenced by what other people are doing or saying. If you want to create something unique, you can’t be flooded with what you think other people want. Create what you want. Create what you need, not what you think others want or need. That’s where true originality comes from. It can be scary to say what you think. Everyone has that fear of rejection, fear of alienation, fear that you are not going to fit in.
But you didn’t get into business to be scared. You got into business to make a difference and provide a solution to others -- your solution, not a solution based on outside influences.

4. Speak directly to your target audience.

Your audience is you. Be bold. Don’t be afraid to speak your mind. Put yourself out there. People respect that shameless aspect. Everyone walks around with a certain type of pain or joy, and if you put a voice to it, that can be a very powerful tool of influence.
Don’t try to reach everyone. Remember, your message isn’t for everyone to hear. Every single person on this planet doesn’t think the same. It is your job to reach your tribe of people. If certain people don’t like you, that is a good thing. Your message is customized. It shouldn’t be general. It shouldn’t reach everyone. If you are trying to include everyone, that will be your biggest downfall.

So, Lets Nuture your StartUp...

Sunday, 12 November 2017

Serverless: Inheritence of Cloud Computing

Serverless computing is a cloud computing execution model in which the cloud provider dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. It is a form of utility computing.

Serverless computing still requires servers. The name "serverless computing" is used because the server management and capacity planning decisions are completely hidden from the developer or operator. Serverless code can be used in conjunction with code deployed in traditional styles, such as microservices. (Microservices is a variant of the service-oriented architecture (SOA) architectural style that structures an application as a collection of loosely coupled services.) Alternatively, applications can be written to be purely serverless and use no provisioned services at all.

AWS Lambda, introduced by Amazon in 2014, was the first public cloud vendor with an abstract serverless computing offering. AWS Lambda initially supported only Node.js. It now supports Python, Java, and C#, and code written in other languages can be invoked indirectly via Node.js.

Serverless computing is more cost-effective than renting or purchasing a fixed quantity of servers, which generally involves significant periods of underutilization or idle time. It can even be more cost-efficient than provisioning an autoscaling group, due to more efficient bin-packing of the underlying machine resources.

Some Odd Knowledge:

In 2016, "WHATEVER" was voted the most annoying word in English in a Poll by the Marist Institute of Public Opinion for the eighth year running.
38% Americans reported that whatever annoys them more than any other word used in conversation.