The Young DBA

Posts

What is the architecture of Azure Data Lake?

What is the architecture of Azure Data Lake? Azure Data Lake is designed with 2 major components, data lake store and analytics. And majorly there are below structure: 1.) Internal system - YARN & WebHDFS. Yarn - Analytics & WebHDFS - Hadoop hdfs storage. 2.) Analytics - USQL 3.) Compute Engine - HdInsight (Big Data batch processing). 3 Azure Data Lake Store (ADLS) serving as the hyper-scale storage layer. What can I do with Azure Data Lake Analytics? · Right now, ADLA is focused on batch processing, which is great for many Big Data workloads. · Prepping large amounts of data for insertion into a Data Warehouse · Processing scraped web data for science and analysis · Churning through text, and quickly tokenizing to enable context and sentiment analysis · Using image processing intelligence to quickly process unstructured image data · Replacing long-running monthly batch processing with shor

What is Data Lake and Azure Data Lake

What is data lake ? A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question. A data lake, on the other hand, maintains data in their native formats and handles the three Vs of big data — volume, velocity, and variety — while providing tools for analyzing, querying, and processing. Data lakes eliminate all the restrictions of a typical data warehouse system by providing unlimited space, unrestricted file size, schema on read, and various ways to access data (including programming, SQL-like queries, and REST calls). What is Azure Data La

Full Text search in MongoDB

Full text search is similar to content search from entire database or from storage where data is located. It is something similar to how we search any content in any search application by entering certain string keywords or phrases and getting back the relevant results sorted by their ranking. This is common requirement in any large data-set application for quick and efficient searching method. This post I am sharing about text search from MongoDB database. Text search option is available in almost every database either RDBMS family or NoSQL family. Mongodb have something different that is ranking (weight of attributes). Starting from version 2.4, MongoDB began with an experimental feature supporting Full-Text Search using Text Indexes. This feature has now become an integral part of the product. The Text Search uses streaming techniques to look for specified words in the string fields by dropping stop words like a, an, the, etc. What are the features of " Mongodb Full

SQL Servers 20 steps after start or re-start

This was always in my mind, what happened when we start or restart sql server services from os service panel. There are long list of activities process out to get connected with database. These are below : 1. Server process ID allocation and with Authentication mode. 2. Logging SQL Server messages in file 'C:\Program Files\Microsoft SQL Server\MSSQL11.SQL2012\MSSQL\Log\ERRORLOG'. 3. Registry startup parameters setting for master, error,log -d,-e, -l 4. SQL Server detected cpu and cores and allocating as per licenses and configuration . 5. SQL Server is starting at normal priority base (=7). 6. SQL Server Detecting RAM and allocating to server as per awe and configured. 7. SQL Server Detecting Node configuration: node 0: CPU mask: 0x000000000000000f:0 Active CPU mask: 0x000000000000000f:0 for NUMA configuration. 8. Starting up database 'master' and do transactions rolled forward in database 'master' (1:0). 9. SQL Server Audit is starting the audits.

Database Performance Troubleshooting Methodologies and Dimensions

When You have assigned a task to optimize a database or performance tuning of an application. Then there could be various dimensions to perform this task because slow performance of application could be many more which can not describe on single page but it can be summary in a table like below I found this table that shows database performance and slow running application performance dimension and there activity start procedures. Performance Dimensions Percentage Values Process Strength Activity Strength Remarks Application Design and Business process 25.00% Long Process Lower priority Module wise activity. Database Schema Design - Logical 15.00% Medium Follow best practices Required short downtime Module wise activity. Database Maintenance 15.00% Quick process Required on OLTP Short downtime weekly or monthly. Indexing 15.00% Quick process Required on OLTP Short downtime weekly or monthly Module wise activity. Server Hardware (CPU/Memory/other) 12.00% Medium process Follow

10 Facts about Azure Stream Analytics

Microsoft Azure Stream Analytics is a serverless scalable complex event processing engine by Microsoft that enables users to develop and run real-time analytics on multiple streams of data from sources such as devices, sensors, web sites, social media, and other applications (wiki). 10 Facts Stream Analytics is an event processing engine, which can ingest & analyze in real-time. ASA can stream data from devices, sensors, websites, social media feeds, applications, and more. Stream Analytics don't mandatory to store data, analyse data in motion as well as stored. Stream Analytics is built on a pull-based communication mode. Stream Analytics supports two input types, stream data and reference data. It has two source types, Azure Event Hubs and Azure Blob storage. Process starts with a source of streaming data that is ingested into Azure Event Hub or Azure Blob Storage. We can create a job that specifies the input source that streams data. The job also specifies a tran

Materialized view In RDBMS

For a moment if you look 15 years back to see the development of Database, Datawarehouse, and Business Intelligence. Then you will see a lot of features came and depreciated from database engines. Today I will talk about one of best feature, that grows the data warehouse "Materialized view" or "Indexed view". This is the heart of traditional data warehouse system which gives the optimized reporting and analytics. Oracle first introduced with 8i, later, Microsoft SQL Server brings in 2000 version. SQL Server's Index view is better than other RDBMS. Indexed view of SQL server is Fully optimized for Datawarehouse queries and autorefresh of data and with schema bindings. This was also the reason, so Microsoft SQL Server gets popularity in Datawarehouse Market. Postgres requires Refresh Materialized View for updated data in reports. And if you ask with MySQL developer, they will say materialized view is nothing but a logically insert into the base table and then i