Showing posts with label NoSQL & SQL. Show all posts
Showing posts with label NoSQL & SQL. Show all posts

Monday, June 23, 2014

SQL HACKS

SQL Zoo Practice

HACKS :

1. Running Total - Theta Join
 you can join two tables with 'on a.col1 >= b.col1' -- this is called theta join

2. Cross Join
This is when you want to create combinations and need to generate Cartesian join in the first place

3. Simple CASE WHEN clause

4. Left Outer Joins for null values

5. Add col C that shows the bigger of col A and col B  

6. Pivot!!!!!!!!!
To display rows as columns and columns as rows - using CASE WHEN(or left outer join) & Union
In the first query , there is a MAX in the clause, it is because group by needs either an aggregating function in the select statement or to use these col in group by clause!

7. Queries and web pages
 
a. AdventureWorks Schema

CustomerAW(CustomerID, FirstName, MiddleName, LastName, CompanyName, EmailAddress) 

CustomerAddress(CustomerID, AddressID, AddressType)

Address(AddressID, AddressLine1, AddressLine2, City, StateProvince, CountyRegion, PostalCode)

SalesOrderHeader(SalesOrderID, RevisionNumber, OrderDate, CustomerID, BillToAddressID, 
ShipToAddressID, ShipMethod, SubTotal, TaxAmt, Freight)

SalesOrderDetail(SalesOrderID, SalesOrderDetailID, OrderQty, ProductID, UnitPrice, 
UnitPriceDiscount)

ProductAW(ProductID, Name, Color, ListPrice, Size, Weight, ProductModelID, ProductCategoryID)

ProductModel(ProductModelID, Name)

ProductCategory(ProductCategoryID, ParentProductCategoryID, Name)

ProductModelProductDescription(ProductModelID, ProductDescriptionID, Culture)

ProductDescription(ProductDescriptionID, Description) 

above schema detail could be found  HERE

Medium Questions -- Details and playground can be found HERE
(better type queries into playground field to see the result) 

6. A "Single Item Order" is a customer order where only one item is ordered. Show the SalesOrderID and the UnitPrice for every Single Item Order.

select s.SalesOrderID,p.ListPrice UnitPrice
from ProductAW p inner join SalesOrderDetail s
on p.ProductID = s.ProductID
group by s.SalesOrderID
having count(s.ProductID) = 1


7. Where did the racing socks go? List the product name and the CompanyName for all Customers who ordered ProductModel 'Racing Socks'. 

select p.Name,c.CompanyName
from CustomerAW c inner join SalesOrderHeader s
on c.CustomerID = s.CustomerID
inner join SalesOrderDetail sd on sd.SalesOrderID = s.SalesOrderID
inner join ProductAW p on p.ProductID = sd.ProductID
inner join ProductModel pm on pm.ProductModelID = p.ProductModelID
where pm.Name = 'Racing Socks'


8.Show the product description for culture 'fr' for product with ProductID 736. 

select pd.Description
from ProductDescription pd inner join ProductModelProductDescription pmpd
on pmpd.ProductDescriptionID = pd.ProductDescriptionID
inner join ProductAW p on p.ProductModelID = pmpd.ProductModelID
where pmpd.Culture = 'fr' and p.ProductID = 736

9. Use the SubTotal value in SaleOrderHeader to list orders from the largest to the smallest. For each order show the CompanyName and the SubTotal and the total weight of the order.

select c.CompanyName, s.SubTotal, s.Freight
from CustomerAW c inner join SalesOrderHeader s
on c.CustomerID = s.CustomerID
order by s.SubTotal DESC


10. How many products in ProductCategory 'Cranksets' have been sold to an address in 'London'?

select count(distinct(sd.ProductID)) as totalNumber
from SalesOrderHeader sh inner join Address a
on a.AddressID = sh.ShipToAddressID
inner join SalesOrderDetail sd on sd.SalesOrderID=sh.SalesOrderID
inner join ProductAW p on p.ProductID = sd.ProductID
inner join ProductCategory pc on pc.ProductCategoryID = p.ProductCategoryID
where a.City = 'London' and pc.Name = 'Cranksets'



Hard Questions -- Details and playground can be found HERE

11.For every customer with a 'Main Office' in Dallas show AddressLine1 of the 'Main Office' and AddressLine1 of the 'Shipping' address - if there is no shipping address leave it blank. Use one row per customer.

select c.CompanyName,ma.AddressLine1 'Main Office Address', 
case when s.AddressType = 'Shipping' then sa.AddressLine1 else '' end as 'Shipping Address'
from CustomerAW c, CustomerAddress m, CustomerAddress s, Address ma, Address sa
where c.CustomerID = m.CustomerID and ma.AddressID = m.AddressID and c.CustomerID = s.CustomerID and s.AddressID = sa.AddressID and m.AddressType = 'Main Office' and ma.City = 'Dallas'
 
12. For each order show the SalesOrderID and SubTotal calculated three ways:
A) From the SalesOrderHeader
B) Sum of OrderQty*UnitPrice
C) Sum of OrderQty*ListPrice 

select sh.SalesOrderID,sh.SubTotal, (sd.OrderQty*sd.UnitPrice) as b, (sd.OrderQty*p.ListPrice) as c
from SalesOrderHeader sh, SalesOrderDetail sd,
ProductAW p
where sh.SalesOrderID = sd.SalesOrderID and p.ProductID = sd.ProductID
group by sh.SalesOrderID

13. Show the best selling item by value. 

select p.Name, sum(sd.OrderQty) as 'sales amout'
from SalesOrderDetail sd, ProductAW p
where sd.ProductID = p.ProductID
group by p.ProductID
order by sum(sd.OrderQty) DESC
 

14. Show how many orders are in the following ranges (in $) like the format below:
    RANGE      Num Orders      Total Value
    0-  99
  100- 999
 1000-9999
10000-

select t.range as RANGE, count(t.range) as 'Num Orders', sum(t.SubTotal) as 'Total Value' from (select case
when SubTotal between 0 and 99 then '0-99' when SubTotal between 100 and 999 then '100-999'
when SubTotal between 1000 and 9999 then '1000-9999'
else '10000-' end as range, SubTotal from SalesOrderHeader) t
group by t.range


 THIS might be helpful to solve #14


15. Identify the three most important cities. Show the break down of top level product category against city.

 

Saturday, May 24, 2014

Sharding VS Partition

今天在网上学习NoSQL的时候发现老出现partition和sharding这两个关键字, 知道他们的大致意思是把数据分分开, 但是不知道具体是什么区别, 看了quora的一个回答, 感觉终于有点明了了(http://www.quora.com/Whats-the-difference-between-sharding-and-partition). 

所谓pairition -- 分区, 是一个更general的动词, 就是把"数据分成几份" 的意思,其中有两种类型,第一种是 "horizontal partition" 水平分区 -- 也叫sharding, 另外一个叫做 "vertical partition" 垂直分区.

horizontal partition 意思很明显, 就是把一个很大的 , 在同一个table里面的数据 (在NoSQL里面应该说是同一个collection里面的documents) 按照row 来分成几个部分, 一般来说, 水平分区的时候会按照一个shard key来进行分区, 什么意思呢? 就比如我有一个table, 里面的PK是customerID, 最简单的利用shard key来分区的方法,就是把ID分成不同份的range来区分, 比如shard 1 是ID 从1-10000, shard 2 是ID从10001- 20000, 这样查找起来数据, 这样是最intuitive,当然DBA在分区的时候还会考虑更多的吗比如那些数据是经常会被一起查找的, 给予query的诉求,会为了减少I/O而把一部分常常一起作为结果的放在一个shard里面.

vertical partition 就是将一个表垂直分开,分成左右两部分.有什么应用呢? 比如,我有一个table, 里面是customerID以及若干个address, 一般来说我只需要一个biling address, 可是我不想lose track of other address, 为了方便, 我把主表和其他地址分成俩个地方储存, 用key 连接.

为什么我上面不直接把horizontal partition 叫sharding呢,其实两者还是有一定区别的, 有一个总结的不错的表,直接贴过来了~
地址:http://like-eagle.iteye.com/blog/704629



Different Types of NoSQL Database

NoSQL stands for 'not only SQL'.

并不是说完全不用到SQL的思想, 只是query的方式不同, 不需要用到传统的SQL query来进行数据的读取,还有一个很重要的特点是, NoSQL DB 不用到传统意义上的join 来进行table之间的结合,也当然就没有FK之类的概念, PK在NoSQL里面基本上是以hash key的形式存在 ( 在key-value以及document db中作为一个field -- _id存在)
现在主流的NoSQL数据库类型有四种:                                                                          

a. Key-Value storage : Oracle NoSQL DB                                                                                          

b. Document Oriented Database: MongoDB, CouchDB

c. Wide-Column Store: Hadoop/HBase, Cassandra

d. Graph Database: Neo4j

在学习了lynda上的counchDB教程以后 (笔记详见板块里的记录), 今天阅读了一下MongoDB的官方指南 < The little MongoDB book>, 下面摘录了一下关于document store 的一些特点! 以MongoDB为例,(与counchDB有所不同)

Document oriented database: CouchDB, mongodb

I will be focusing on mongoDB as example. There are 6 main components in mongoDB

1. instance: one instance in mongoDB is made up of one or  more databases. 

2. database: The concept of 'database' is similar to the one used in 
   traditional relational database.

3. collections: one database is made up of one or more collections. And a 
   collection is made up of one or more documents. As a reference, you can think a collection as a table in traditional relational db.

4. document: one document is made up of several fields. one document could be 
   compared as one row in traditional relational database.

5. fileds: fileds could be compared as columns in table, but it doesn't 
   necessarily to be the same even within one table -- 'one document', which makes it more flexible than traditional db.

6. indexes: indexes in mongodb is similar with indexes in RDBMS 

7. cursor: when you ask mongodb for data, it returns curosr, which could do a 
lot of things without actually pulling out data, things include counting and skipping ahead.   ( to be studied.)

不得不说对cursor和index的理解都不是很深入,需要再好好看一下,这里留个伏笔~