MS SQL Server サンプルデータの AdventureWorksDW2019 スキーマを使って T-SQL を覚えるメモ。
ファクトテーブルの dbo.FactInternetSales と関連ディメンションテーブルを結合して、累計購入価格を基準にユーザーを10個のグループに分けるデシル分析をする。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | declare @startdate date = '2013-01-01' ; declare @enddate date = '2013-12-31' ; with t1 as ( SELECT s.[ProductKey], s.[CustomerKey], c.[CustomerAlternateKey], s.[OrderDate], s.[OrderDateKey], c.[Gender], c.[BirthDate], convert ( int , format(c.[BirthDate], 'yyyyMMdd' )) as i_birthdate, floor((s.[OrderDateKey] - convert ( int , format(c.[BirthDate], 'yyyyMMdd' ))) / 10000) as age, c.[DateFirstPurchase], c.[GeographyKey], g.[CountryRegionCode], d.[FullDateAlternateKey], convert ( varchar (7), format(d.[FullDateAlternateKey], 'yyyy-MM' )) as year_month, s.[SalesOrderNumber], s.[SalesOrderLineNumber], s.[OrderQuantity] * s.[UnitPrice] as revenue, pc.[EnglishProductCategoryName] as category, psc.[EnglishProductSubcategoryName] as subcategory, p.[EnglishProductName] as productname FROM [dbo].[FactInternetSales] as s inner join [dbo].[DimDate] as d on s.[OrderDateKey] = d.[DateKey] inner join [dbo].[DimCustomer] as c on s.[CustomerKey] = c.[CustomerKey] inner join [dbo].[DimGeography] as g on c.[GeographyKey] = g.[GeographyKey] inner join [dbo].[DimProduct] as p on s.[ProductKey] = p.[ProductKey] left join [dbo].[DimProductSubcategory] as psc on p.[ProductSubcategoryKey] = psc.[ProductSubcategoryKey] left join [dbo].[DimProductCategory] as pc on psc.[ProductCategoryKey] = pc.[ProductCategoryKey] where s.[OrderDate] between @startdate and @enddate ), t2 as ( select t1.[CustomerAlternateKey], sum (t1.[revenue]) as revenue from t1 group by t1.[CustomerAlternateKey] ), t3 as ( select t2.[CustomerAlternateKey], t2.[revenue], ntile(10) over( order by t2.[revenue] desc ) as decile from t2 ), t4 as ( select t3.[decile], sum (t3.[revenue]) as revenue, avg (t3.[revenue]) as avg_revenue, sum ( sum (t3.[revenue])) over( order by decile) as stacked_revenue, sum ( sum (t3.[revenue])) over() as total_revenue from t3 group by t3.[decile] ) select t4.[decile], t4.[revenue], t4.[avg_revenue], 100.0 * t4.[revenue] / t4.[total_revenue] as total_rate, 100.0 * t4.[stacked_revenue] / t4.[total_revenue] as stacked_rate from t4; |
結果はこうなる。

T-SQL で顧客データをデシル分析
PowerBI で表示するとこうなる。

Power BI でデシル分析を可視化
参考図書
ビッグデータ分析・活用のためのSQLレシピ
SQL Server 2016の教科書 開発編