|
此文章由 hxsh2000 原创或转贴,不代表本站立场和观点,版权归 oursteps.com.au 和作者 hxsh2000 所有!转贴必须注明作者、出处和本声明,并保持内容完整
39. Cosmos DB: Time to Live
• You can set the expiry time for Cosmos DB data items
• Time to Live value is configured in seconds
• The system will automatically delete the expired items based on the TTL value
• Consume ONLY leftover Request Units
• Data deletion delay if not enough Request Units: though the data deletion is delayed, data is not returned by any queries (by any API) after the TTL has expired
40. Cosmos DB: Global Distribution
• Global distribution benefits: Performance (Ensures high availability within a region; Across regions, brings data closer to the consumer; Reduce the latency as close to the customers); Business continuity (In the event of major failure or natural disaster)
43. Cosmos DB: 5 Consistent Level
• Strong --> Bounded Staleness --> Session --> Consistent Prefix --> Eventual
Strong Consistency, High Latency, Lower Availability --------> Wake Consistency……
• Strong: no dirty reads, high latency, cost highest, closest to RDBMS
Bounded Staleness: dirty reads possible, bounded by time and update
Session: No dirty reads for writers (within same session), dirty read possible for other users
Consistency Prefix: dirty reads possible but sequence maintains, reads never see out-of-order writes
Eventual: no guaranteed order, but eventually everything gets in order
• Set default Consistency Level for entire account, and weaker Consistency Level for requests
46. Cosmos DB: Security
• Role based access control (RBAC) / Identity and access management (IAM)
Network security
Access security keys
Cross-Origin resource sharing (CORS)
Azure private endpoint
Advanced security option
47. What is the Data Lake?
• Data Lake is a repository for large quantities and varieties of structure, semi-structure and unstructured data in their native formats. Azure data lake is nothing more than a big container of the data, a very big container. You can throw as much data as you want into the data lake. And later, whenever you are ready, you can query it. There is no limit in the data lake.
48. How Data Lake Gen 2 evolved?
• Hadoop HDFS: Hadoop Distributed File System, database of Hadoop
• Fault tolerant file system
Runs on commodity hardware (cheap)
Data Platform use it: MapReduce, Pig, Hive, Spark etc.
HDFS in Cloud = Data Lake Storage Gen1
• Azure Blob Storage:
Large object storage in cloud
Optimized for storing massive amounts of unstructured data (e.g. text or binary data)
General purpose object storage
Cost efficient
Provide multiple tiers
• Azure Blob Storage + Data Lake Gen 1 = Azure Data Lake Storage Gen 2 (always use Gen 2 for the big data storage needs; exception: infrastructure built with USQL which is not supported in Gen 2)
49. Azure Blob Storage vs Azure Data Lake
• Azure Blob Storage vs Azure Data Lake Storage (Gen 2)
Azure Blob Storage Azure Data Lake Storage (Gen 2)
General purpose data storage Optimized for big data analytics
Container based object storage Hierarchical namespace on Blob Storage
Available in every Azure region Available in every Azure region
Local and global redundancy Local and global redundancy
Processing performance limit Supports a subset of Blob Storage features
Supports multiple Azure integrations
Compatible with Hadoop
50. Azure Blob & Data Lake Security options
• Authentication: Storage Account Keys; Shared Access Signature (SAS); Azure Active Directory (Azure AD)
• Access Control: Role Based Access Control (RBAC); Access Control List (ACL)
• Network Access: Firewall and Virtual Network
• Shared Access Signature (SAS): 1. Security Token String; 2. “SAS Token”; 3. Contains permission like start and end time; 4. Azure doesn’t track SAS after creation; 5. To invalidate, regenerate Storage Account Key used to sign SAS
• Azure Active Directory (Azure AD) and Role Based Access Control (RBAC) works side by side |
|