AWS Certification Exam Cram Notes (2020)
Here are my exam cram notes for the AWS Certified Practitioner, AWS SysOps Associate, and AWS Architect Associate exams.
This content is provided raw, as it was written back in 2020 (possible mistakes and all). This is more for my own reference in the future, perhaps when I need to re-take the exams once the certs expire.
A new version of the AWS Web Application Firewall was released in November 2019. With AWS WAF classic you create “IP match conditions”, whereas with AWS WAF (new version) you create “IP set match statements”. Look out for wording on the exam. WAF vs Shield - Shield protects against Layer 3 and 4 DDoS attacks. - WAF protects against Layer 7 DDoS attacks. - WAF also protects against Cross-Site Scripting (XSS), SQL injection, and command injection attacks. EC2 Autoscaling VS AWS Autoscaling EC2 Autoscaling works in conjunction with the AWS Autoscaling service to provide a predictive ability to your autoscaling groups. - You should use EC2 Auto Scaling if you only need to scale Amazon EC2 Auto Scaling groups, or if you are only interested in maintaining the health of your EC2 fleet. - You should use AWS Auto Scaling to manage scaling for multiple resources across multiple services. AWS Auto Scaling lets you define dynamic scaling policies for multiple EC2 Auto Scaling groups or other resources using predefined scaling strategies. Further reading: https://aws.amazon.com/ec2/autoscaling/faqs ASG: Cooldown Period VS Cloudwatch Alarm Evaluation Period - The cooldown period is a configurable setting for your Auto Scaling group that helps to ensure that it doesn’t launch or terminate additional instances before the previous scaling activity takes effect so this would help if you notice that Auto Scaling is scaling the number of instances up and down multiple times in the same hour. -- After the Auto Scaling group dynamically scales using a simple scaling policy, it waits for the cooldown period to complete before resuming scaling activities. - The CloudWatch Alarm Evaluation Period is the number of the most recent data points to evaluate when determining alarm state. This would help as you can increase the number of datapoints required to trigger an alarm if you notice that Auto Scaling is scaling the number of instances up and down multiple times in the same hour. Redshift VS RedShift Spectrum: Redshift: Cloud Data Warehouse. Relational, OLAP-style database. It’s a data warehouse built for the cloud, to run the most complex analytical workloads in standard SQL and existing Business Intelligence (BI) tools. RedShift is a columnar data warehouse DB that is ideal for running long complex queries. RedShift can also improve performance for repeat queries by caching the result and returning the cached result when queries are re-run. Dashboard, visualization, and business intelligence (BI) tools that execute repeat queries see a significant boost in performance due to result caching. RedShift Spectrum: A feature of Amazon Redshift. Spectrum is a serverless query processing engine that allows to join data that sits in Amazon S3 with data in Amazon Redshift. Amazon Redshift Spectrum allows you to directly run SQL queries against exabytes of unstructured data in Amazon S3. No loading or transformation is required. Athena VS RedShift Spectrum: - Athena is truly serverless. - RedShift Spectrum is not truly serverless as it requires an existing RedShift cluster which is based on EC2 instances. Cognito: User Pool VS Identity Pool 1. User Pool: Authenticate and get tokens. A user pool is a user directory in Amazon Cognito. With a user pool, users can sign in to web or mobile apps through Amazon Cognito, or federate through a third-party identity provider (IdP). With an identity pool, users can obtain temporary AWS credentials to access AWS services, such as Amazon S3 and DynamoDB. 2. Identity Pool: Exchange tokens for AWS credentials. Identity pools provide temporary AWS credentials for users who are guests (unauthenticated) and for users who have been authenticated and received a token. An identity pool is a store of user identity data specific to your account. 3. Access AWS services with credentials Cognito VS "STS and SAML" - The AWS Security Token Service (STS) is a web service that enables you to request temporary, limited-privilege credentials for IAM users or for users that you authenticate (such as federated users from an on-premise directory). -- Federation (typically Active Directory) uses SAML 2.0 for authentication and grants temporary access based on the users AD credentials. The user does not need to be a user in IAM. -- When you develop a custom identity broker, you use STS. - Amazon Cognito is used for authenticating users to web and mobile apps not for providing single sign-on between on-premises directories and the AWS management console. OpsWorks: managed Puppet/Chef Chef: OpsWorks Stacks VS OpsWorks for Chef Automate - OpsWorks for Chef Automate is a **fully-managed configuration management service** that helps you instantly provision a Chef server and lets the service operate it, including performing backups and software upgrades. - The OpsWorks Stacks service helps you **model, provision, and manage your applications** on AWS using the embedded Chef solo client that is installed on Amazon EC2 instances on your behalf. Clearing up some EBS confusion about TYPES and FAMILIES: - All EBS VOLUME TYPES (ex: gp2, io2, io1, st1, sc1) support encryption and all INSTANCE FAMILIES (ex: general purpose, memory optimized, accelerated computing) now support encryption. - Not all INSTANCE TYPES support encryption (usually older generation like t1.micro and all the m1.* types; it appears the only current generation type that does not support encryption is cc1.4xlarge). Cross-load balancing VS setting up ASG in multiple subnets - Setting up ASG in mutiple subnets (which is an ASG feature) will distribute traffic evenly across availability zones, but if the AZs have different number of instances, the traffic will not be the same load on all instances (ex: AZ A with 2 instances will have 50% on one, 50% on the other, and AZ B with 4 instances will have 25% on each). - Enabling Cross-Zone load balancing (which is an ELB feature) will ensure that even if you have AZ A with 4 instances and AZ B with 6 instances, that each instance gets only 10% of the traffic. Visual aid: https://www.youtube.com/watch?v=btu2mMWTJdI Please note that Cros-zone load Balancing is enabled by default on Application Load Balancer. Route53 Alias vs A vs AAAA vs CNAME records - Alias records are used ONLY to direct to AWS resources (NOT on-prem). -- Elastic Beanstalk, API gateway, VPC interface endpoint, Cloudfront, ELB, AWS global Accelerator, S3 bucket (only when set up as static web site), another Route53 record (in same hosted zone). --- The above are it. So NOT Elasticache, NOT EFS/EBS, NOT RDS, etc... - CNAMEs cannot be used for zone APEX (naked domain) records (A, AAAA, or Alias (only for AWS resources) are used for those), but otherwise CNAMEs can be used for any other records (examples: on-prem resources). - A or AAAA maps a DNS name to an IP address, but you cannot obtain the IP of an ELBs so you must use Alias instead. - A is IPV4; AAAA is IPV6 SQS Standard VS FIFO queues - Standard queue: messages introduced to the queue At-least once (possible duplicates). - FIFO queue: messages introduced to the queue Exactly once (no duplicates), in order. SNS vs SQS: - Both can be used to decouple things. However A Could Guru really emphasizes that when you hear the word "decouple", think SQS. But in Neal Davis' Udemy practice tests, one question in particular is there to trick you by mentioning both "decoupling" and "invoking Lambda". - SQS cannot invoke a Lambda function, but SNS can. - SQS is pull-based (ex: Lambda pulling message from SQS queue) and SNS is push-based (ex: SNS pushing to Lambda ("SNS invoking Lambda to do work" would be more accurate)). SQS vs SWF - Amazon SQS does NOT keep track of all tasks and events in an application. With SQS, you must implement your own application-level tracking, especially if your application uses multiple queues. - Amazon SWF does provide tracking of these types. SQS vs Amazon MQ - Amazon MQ is similar to SQS but is used for existing applications (using an existing on-prem message queue technology compatible with Amazon MQ) that are being migrated into AWS. - SQS should be used for new applications being created in the cloud. Database replication: - Synchronous replication (Active to Standby) - Asynchronous replication (to a Read replica). Launch Configuration VS Launch template: Launch template is similar to launch configuration which usually Auto Scaling group uses to launch EC2 instances. ***However, defining a launch template instead of a launch configuration allows you to have multiple versions of a template.*** ***Launch Templates also enables the storing of settings such as AMI ID, instance type, key pairs and Security Groups*** AWS recommend that we should use launch templates instead of launch configurations to ensure that we can leverage the latest features of Amazon EC2, such as T2 Unlimited instances. References: https://docs.aws.amazon.com/autoscaling/ec2/userguide/LaunchTemplates.html Auto Scaling Groups can be edited once created (however launch configurations cannot be edited). RDS Multi-AZ and ability to use standby instance for reads - RDS Multi-AZ is used for Availabilty, not Performance. Read-Replicas are for performance. -- Multi-AZ deployments are not a read scaling solution, you cannot use a standby replica to serve read traffic (**except** RDS's internal process of using the Multi-AZ standby replica during backups so as to not affect the primary instance). The standby is only there for failover (and during RDS backups). - An EXCEPTION is Aurora. Aurora Replicas are independent endpoints in an Aurora DB cluster, best used for scaling read operations and increasing availability. -- An Aurora Replica is both a standby in a Multi-AZ configuration and a target for read traffic. With Aurora, it IS possible to update the application to read from the Multi-AZ Standby instance. RDS VS DynamoDB: - DynamoDB lends itself better to supporting stateless web/app installations. Use cases for DynamoDB include storing JSON data, BLOB data and storing web session data. - RDS is for storing data that requires relational joins and highly complex updates DynamoDB with Global Tables vs Aurora Global Database: - An Aurora global database consists of one primary AWS Region where your data is mastered, and up to five (5) **READ-ONLY**, secondary AWS Regions. - DynamoDB global tables provide a fully managed solution for deploying a multiregion, multi-master (read-write, active-active) database, without having to build and maintain your own replication solution. -- This is the only solution presented that provides an active-active configuration where reads and writes can take place in multiple regions with full bi-directional synchronization. Aurora Global Database VS Aurora Replicas: - Aurora Replicas are independent (multi-master) endpoints in an Aurora DB cluster, best used for scaling read operations AND increasing availability **within a region**. -- remember that Aurora RDS DB has Multi-Master; MySQL RDS DB does not. - Amazon Aurora Global Database is NOT suitable for scaling read operations within a region, but used **across regions** instead. It is a new feature in the MySQL-compatible edition of Amazon Aurora, designed for applications with a global footprint. It allows a single Aurora database to span multiple AWS regions, with fast replication to enable low-latency global reads and disaster recovery from region-wide outages. Scheduled scaling VS Step Scaling VS Simple Scaling: - Step scaling policies increase or decrease the current capacity of your Auto Scaling group based on a set of scaling adjustments, known as step adjustments. The adjustments vary based on the size of the alarm breach. This is more suitable to situations where the load unpredictable. - Scaling based on a schedule allows you to set your own scaling schedule for predictable load changes. To configure your Auto Scaling group to scale based on a schedule, you create a scheduled action. This is ideal for situations where you know when and for how long you are going to need the additional capacity. - AWS recommend using step over simple scaling in most cases. With simple scaling, after a scaling activity is started, the policy must wait for the scaling activity or health check replacement to complete and the cooldown period to expire before responding to additional alarms (in contrast to step scaling). Again, this is more suitable to unpredictable workloads. Amazon CloudFront VS AWS Global Accelerator VS Amazon S3 Transfer Acceleration: AWS Global Accelerator and Amazon CloudFront are separate services that use the AWS global network and its edge locations around the world. - CloudFront improves performance for both cacheable content (such as images and videos) and dynamic content (such as API acceleration and dynamic site delivery). -- You CANNOT configure a database as a custom origin in CloudFront. - Global Accelerator improves performance for a wide range of applications over TCP or UDP by proxying packets at the edge to applications running in one or more AWS Regions. -- Global Accelerator is a good fit for non-HTTP use cases, such as gaming (UDP), IoT (MQTT), or Voice over IP, as well as for HTTP use cases that specifically require static IP addresses or deterministic, fast regional failover. - Amazon S3 Transfer Acceleration is used for speeding up UPLOADS of data to Amazon S3 by using the CloudFront network. It is not used for downloading data. Elastic BeanStalk: Compute: managed Web application. Upload your code (PHP, Ruby, Node.js, etc.) and AWS managed the back-end. You focus on your web application. Elasticache: Managed Memcached or Redis instances. In-memory key/value store database (more OLAP than OLTP). Elasticache for memcached: does NOT offer native encryption. Elasticache for Redis: Offers native encryption. Offers persistence. Pub/Sub, Sorted Sets and an In-Memory Data Store (also session state data: https://aws.amazon.com/getting-started/hands-on/building-fast-session-caching-with-amazon-elasticache-for-redis/) *****However, ElastiCache is only a key-value store and cannot therefore store relational data. *****Also: ElastiCache cannot be used as an Internet facing web front-end (that would be Cloudfront). *****Also: You cannot use Amazon ElastiCache to cache API requests. *****Elasticache is also great to cache content between S3 and an internal media server (unlike Cloudfront which caches content between S3 and external clients across the globe). ***Remember: RE-dis has RE-plication (means that only Redis is Highly Available or can have read replicas). ***Remember: M-emcached is M-ultithreaded. Redis is not. Other than simplicity, the multithreading is the only thing that Memcached has over Redis. Elastic MapReduce (EMR): - Big data frameworks; managed Hadoop/Spark/Hive/Presto - Amazon EMR is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. EMR utilizes a hosted Hadoop framework running on Amazon EC2 and Amazon S3. - EMR doesn’t natively support SQL. Kinesis: Amazon Kinesis VS Kinesis Firehose: - Amazon Kinesis Firehose (Delivery streams, records of data, destinations, no data persistence) has a 1-min lag therefore it cannot be used to solve any real-time requirements. -- Amazon Kinesis Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics tools. -- Kinesis Firehose can allow transformation of data and it then delivers data to supported services. -- Firehose Destinations include: Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk. NOT RDS. RDS is a transactional database and is not a supported Kinesis Firehose destination. - Kinesis Data Streams (Shards, Consumers, data persistence) are best suited for real-time data processing, and they have a size limit of 1MB which would be too low for high-quality images. -- Kinesis Data Streams enables you to build custom applications that process or analyze streaming data for specialized needs. It enables real-time processing of streaming big data and can be used for rapidly moving data off data producers and then continuously processing the data. -- Kinesis Data Streams STORES data for later processing by applications (key difference with Firehose which delivers data directly to AWS services). - Kinesis Data Analytics: use to process streaming data in real time with standard SQL without having to learn new programming languages or processing frameworks - ****Erik here: Wording in exam may be confusing... for example "loading and transforming (before putting into another source)" would be Kinesis Firehose, but "loading and processing (by applications)" would be Kinesis Data Streams. ECS vs ECR: - ECS (Elastic Container Service) is used to run container processes. - ECR (Elastic Container Registry) is used to store container images. API Gateway VS AWS AppSync: - API Gateway is used for RESTful applications. - AWS AppSync is used for GraphQL powered applications. Request Tracing: can be used to track HTTP requests from clients to targets. STS: - The AWS Security Token Service (STS) is a web service that enables you to request temporary, limited-privilege credentials for AWS Identity and Access Management (IAM) users or for users that you authenticate (federated users). - You cannot configure databases like MySQL to directly use STS (for example, you cannot "configure a MySQL DB to use STS"). AWS Config: - AWS Config allows gives you a view of the configuration of your AWS infrastructure and compares it for compliance against rules you can define (or choose, such as s3-bucket-public-read-prohibited or s3-bucket-public-write-prohibited) https://aws.amazon.com/blogs/aws/aws-config-update-new-managed-rules-to-secure-s3-buckets/ - AWS Config logs all changes to your configuration on a timeline, and it also allows you to retrace the steps via CloudTrail to see associated events with the configuration changes. Artifact: security and compliance DOCUMENTS. ISO; PCI; SOC. demonstrate to auditors the security and compliance of the AWS infrastructure. Inspector: Only for EC2! Not for RDS. Agent-based security/compliance assessment service; assesses apps for vulnerabilities or deviations from best practices. Trusted Advisor: more than security; it's about advising you on cost, security, performance, fault tolerance, and SERVICE LIMITS. Trusted Advisor only has a few parts free. Trusted Advisor provides areas to optimize costs but doesn't provide cost and budget reports. Cost Explorer: Choose Cost Explorer to track and analyze your AWS usage. Cost Explorer is free for all accounts and can filter by Region, purchase option, tags, among other things. AWS Budgets: AWS Budgets gives you the ability to set alerts when costs or usage are exceeded. AWS Cost Explorer lets you visualize, understand, and manage your AWS costs and usage over time. AWS Cost & Usage Report lists AWS usage for each service category used by an account and its IAM users and finally, Reserved Instance Reporting provides a number of RI-specific cost management solutions to help you better understand and manage RI Utilization and Coverage. GuardDuty: Intelligent (Machine-Learning) THREAT detection to protect AWS accounts and workloads. AWS Cost Explorer lets you visualize, understand, and manage your AWS costs and usage over time. AWS Cost & Usage Report lists AWS usage for each service category used by an account and its IAM users and finally, Reserved Instance Reporting provides a number of RI-specific cost management solutions to help you better understand and manage RI Utilization and Coverage. AWS Budgets gives you the ability to set alerts when costs or usage are exceeded. AWS X-Ray helps developers analyze and debug production, distributed applications. X-Ray is a distributed tracing system. X-Ray allows transaction tracing throughout applications (ex: By modifying the code to use the X-Ray snippets, potential bottlenecks accessing different resources and endpoints can easily be detected) but would not push .NET metrics to AWS (that would be custom metrics through CLoudWatch agent). Example scenario: "An insurance company has a serverless application setup utilizing API Gateway, AWS Lambda, and DynamoDB for its web application. The engineering manager of the company has instructed the team to identify, track, and detect all potential bottlenecks related to POST method calls being performed by the AWS Lambda functions." CloudWatch Logs Insights is the most suitable tool to perform pre-built queries on CloudWatch Logs. CloudWatch Alarms VS CloudWatch Events VS CloudWatch Logs VS CloudWatch Logs Insights - CloudWatch Alarms: -- Alarms based on METRICS being watched -- Alarms can be added to dashboards - CloudWatch Events: -- Events based on ACTIONS and CHANGES done to AWS services. -- Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in Amazon Web Services (AWS) resources. Though you can generate custom application-level events and publish them to CloudWatch Events this is not the best tool for monitoring application logs. -- Use rules to match events and route them to other functions or streams. -- Can schedule automated actions that self-trigger at certain times using cron or rate expressions. -- Some services (such as GuardDuty) may not have metrics, so CloudWatch events with SNS would be better suited for example for getting notified of Medium and High severity events from GuardDuty. - Cloudwatch Logs: -- You can use CloudWatch Logs to monitor applications and systems using log data. For example, CloudWatch Logs can track the number of errors that occur in your application logs and send you a notification whenever the rate of errors exceeds a threshold you specify. -- Monitor, store, and access/query log files from EC2, CloudTrail, Route53, and MANY others (For example: If you want to send logs from some AWS services to CloudWatch Logs, you must use or create a CloudWatch Logs resource policy that grants the service permission to send their logs to CloudWatch Logs. This issue can affect Amazon API Gateway, AWS Step Functions, and Amazon Managed Streaming for Apache Kafka.) - Cloudwatch Logs Insights: -- Perform pre-built queries on CloudWatch Logs. Logs: VPC Flow Logs vs Access Logs vs CloudWatch Logs vs CloudTrail - Access Logs: FOR ELB. You can enable access logs on the ALB and this will provide the information required including requester, IP, and request type. Access logs are not enabled by default. You can optionally store and retain the log files on S3. - VPC Flow Logs: FOR Network Traffic. - Cloudwatch Logs: Monitor, store, and access/query log files from EC2, CloudTrail, Route53, and MANY others (For example: If you want to send logs from some AWS services to CloudWatch Logs, you must use or create a CloudWatch Logs resource policy that grants the service permission to send their logs to CloudWatch Logs. This issue can affect Amazon API Gateway, AWS Step Functions, and Amazon Managed Streaming for Apache Kafka.) - CloudTrail: To record API calls; calls that fail authentication for example. Elasticity VS Scalability: Elasticity: Scale out with demand (short term) Scalability: Scale out infrastructure (long term) Bucket ACLs (applies to objects) vs Bucket Policy (applies to bucket) IOPS vs Throughput: _____________________________________________ An AWS Batch multi-node parallel job is compatible with any framework that supports IP-based, internode communication, such as Apache MXNet, TensorFlow, Caffe2, or Message Passing Interface (MPI). When using Active Directory to authenticate to AWS, which of the following answers contains the correct steps, in the correct order? The user navigates to ADFS webserver. The user enter in their single sign on credentials. The user's web browser receives a SAML assertion from the AD server. The user's browser then posts the SAML assertion to the AWS SAML end point for SAML and the AssumeRoleWithSAML API request is used to request temporary security credentials. 5) The user is then able to access the AWS Console. IAM: Roles/Groups/Policies: You manage access in AWS by creating policies and attaching them to IAM identities (users, or roles) or Groups or AWS resources You attach a policy, not a role, to a group to give those users in the group the appropriate permissions. Groups: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_groups.html - An IAM group is a collection of IAM users. - Groups CANNOT be nested; they contain only users, not other groups. - Note that a group is not truly an identity because it cannot be identified as a principal in an IAM policy. A group is simply a way to attach policies to multiple users at oen time. - Groups let you specify permissions for multiple users, which can make it easier to manage the permissions for those users. - For example, you could have a group called Admins and give that group the types of permissions that administrators typically need. - Any user in that group automatically has the permissions that are assigned to the group. - If a new user joins your organization and needs administrator privileges, you can assign the appropriate permissions by adding the user to that group. - Similarly, if a person changes jobs in your organization, instead of editing that user's permissions, you can remove him or her from the old groups and add him or her to the appropriate new groups. Roles: _____________________________________________ AWS supports six types of policies: identity-based policies: (AKA IAM Policy?)____________Within an IAM policy you can grant either programmatic access or AWS Management Console access to Amazon S3 resources.________AWS IAM policies can be used to grant IAM users’ with fine-grained control to Amazon S3 buckets._____________________________ resource-based policies: (Ex: bucket policy?)_____________________Use a bucket policy to only allow referrals from the main website URL____________________________ permissions boundaries: _________________________________________________ Organizations SCPs: Service control policies (SCPs) offer central control over the maximum available permissions for all accounts in your organization, allowing you to ensure your accounts stay within your organization’s access control guidelines. For example, if security policy requires that use of specific API actions are limited across all accounts, create a service control policy in the root organizational unit to deny access to the services or actions. ACLs: _________________________________________________ session policies: _________________________________________________ ECS: IAM Roles for Tasks VS IAM roles for EC2 instances VS IAM Instance Profile for EC2 instances - IAM roles for ECS tasks enables you to secure your infrastructure by assigning an IAM role directly to the ECS task rather than to the EC2 container instance. -- This means you can have one task that uses a specific IAM role for access to S3 and one task that uses an IAM role to access DynamoDB. -- IAM roles can be specified at the container and task level on EC2 launch type and the task level on Fargate launch type. -- You can only apply one IAM role to a Task Definition. - With IAM roles for EC2 instances you assign all of the IAM policies required by tasks in the cluster to the EC2 instances that host the cluster. - An IAM Instance Profile for EC2 instances is a container for an IAM role that you can use to pass role information to an EC2 instance when the instance starts. Access keys: - Access keys are a combination of an access key ID and a secret access key and you can assign two active access keys to a user at a time. These can be used to make programmatic calls to AWS when using the API in program code or at a command prompt when using the AWS CLI or the AWS PowerShell tools. SCP vs Resource Access Manager - To apply restrictions across multiple member accounts you must use a Service Control Policy (SCP) in the AWS Organization. - AWS Resource Access Manager (RAM) is a service that enables you to easily and securely share AWS resources with any AWS account or within your AWS Organization. It is NOT used for restricting access or permissions. Lambda stuff: - Lambda tracks the number of requests, the latency per request, and the number of requests resulting in an error. - When you deploy your Lambda function, all the environment variables you’ve specified are encrypted by default AFTER, but not DURING, the deployment process. They are then decrypted automatically by AWS Lambda when the function is invoked. If you need to store sensitive information in an environment variable, you should encrypt that information before deploying your Lambda function. The Lambda console makes that easier for you by providing encryption helpers that leverage AWS Key Management Service to store that sensitive information as Ciphertext. - To enable your Lambda function to access resources inside your private VPC, you must provide additional VPC-specific configuration information that includes "VPC subnet IDs" and "VPC security group IDs". AWS Lambda uses this information to set up elastic network interfaces (ENIs) that enable your function. AWS DataSync vs AWS Data Migration Service vs AWS Schema Conversion Tool - AWS datasync is to migrate to S3, FSx and EFS. - AWS Data Migration Service to migrate from on-prem or within AWS from one DB type to another, or from S3 to a DB. - AWS Schema Migration Tool is used for a heterogenous database conversion/migration. SMT is used before DMS. Ex: SMT used on showball edge and then ship Snowball to AWS (ends up in S3) and then use DMS to migrate from S3 to RDS or non-relational DB services. S3 Server Access Logging VS bucket-level object logging using CloudTrail - Server access logging offers web-style access logging (ex: Fields for Object Size, Total Time, Turn-Around Time, and HTTP Referer for log records; Life cycle transitions, expiration, restores; Logging of keys in a batch delete operation; Authentication failures). The S3 server access logs seem to give more comprehensive information about the logs like BucketOwner, HTTPStatus, ErrorCode, etc. -- Server Access Logging is best-effort log delivery. -- Server access logging does not have an option for choosing management events, data events, and it does not offer log file validation. - CloudTrail logs management events, data events, and it offers log file validation -- CloudTrail does not deliver logs for requests that fail authentication (in which the provided credentials are not valid). However, it does include logs for requests in which authorization fails (AccessDenied) and requests that are made by anonymous users. - AWS Support recommends that decisions can be made using CloudTrail logs and if you need those additional information too which is not available in CloudTrail logs, you can then use Server access logs. More in-depth info: https://www.netskope.com/blog/aws-s3-logjam-server-access-logging-vs-object-level-logging Data events VS Management events: - Data events provide visibility into the resource operations performed on or within a resource. These are also known as "data plane" operations. Data events are often high-volume activities. Example data events include: -- Amazon S3 object-level API activity (for example, GetObject, DeleteObject, and PutObject API operations). -- AWS Lambda function execution activity (the Invoke API). - Management events provide visibility into management operations that are performed on resources in your AWS account. These are also known as "control plane" operations. Example management events include: -- Configuring security (for example, IAM AttachRolePolicy API operations) -- Registering devices (for example, Amazon EC2 CreateDefaultVpc API operations). Consistency VS Concurrency - Data consistency means that each user sees a consistent view of the data, including visible changes made by the user's own transactions and transactions of other users. -- Be aware of Read-After-Write consistency and Eventual consistency - Data concurrency (S3, EFS) means that many users can access data at the same time. Placement Groups: - Cluster – packs instances close together inside an Availability Zone. This strategy enables workloads to achieve the low-latency network performance necessary for tightly-coupled node-to-node communication that is typical of HPC applications. - Partition – spreads your instances across logical partitions such that groups of instances in one partition do not share the underlying hardware with groups of instances in different partitions. This strategy is typically used by large distributed and replicated workloads, such as Hadoop, Cassandra, and Kafka. - Spread – strictly places a small group of instances across distinct underlying hardware to reduce correlated failures. General Purpose SSD (gp2) volumes offer cost-effective storage that is ideal for a broad range of workloads. These volumes deliver single-digit millisecond latencies and the ability to burst to 3,000 IOPS for extended periods of time. Between a minimum of 100 IOPS (at 33.33 GiB and below) and a maximum of 16,000 IOPS (at 5,334 GiB and above), baseline performance scales linearly at 3 IOPS per GiB of volume size. AWS designs gp2 volumes to deliver their provisioned performance 99% of the time. A gp2 volume can range in size from 1 GiB to 16 TiB. In this case the volume would have a baseline performance of 3 x 200 = 600 IOPS. The volume could also burst to 3,000 IOPS for extended periods. As the I/O varies, this should be suitable. OLTP vs OLAP - OLTP: Online Transaction Processing; RDS (transactional database) - OLAP: Online Analytics Processing; Redshift (Analytics) ASG: Unlike AZ rebalancing, termination of unhealthy instances happens first, then Auto Scaling attempts to launch new instances to replace terminated instances. API Gateway - differentiating the different kinds of throttling: - Per-client throttling limits is the **only** API gateway throttling that can apply to one or more INDIVIDUAL customers. - Server-side throttling limits are applied across ALL clients. These limit settings exist to prevent your API—and your account—from being overwhelmed by too many requests. In this case, the solutions architect need to apply the throttling to a single client. - Per-method throttling limits apply to ALL customers using the same method (ex: GET, PUT). This will affect all customers who are using the API. - Account-level throttling limits define the maximum steady-state request rate and burst limits for your entire AWS account. This does not apply to individual customers. Making RDS and publicly accessible: - Needs to be done as a setting within the RDS DB itself. And then place the DB in a public subnet. - The RDS instance itself does not need a public IP. - Once it's public, to gain access, a security group will need to be created and assigned to the RDS instance to allow access (ex: if you app is on-prem, to allow access from the public IP address of your application (or on-prem firewall)). - A "DB subnet group" is a collection of (usually private) subnets. It canot be made "publicly accessible" and even if one or more subnets in a DB subnet group were to be public, the only way to make the RDS DB public is if you set that up within the DB itself (and then placed in a public subnet). Default VS New/Custom Initial Settings for Security Groups or NACLs or DNS hostnames for EC2 instances: - Default Security Group: -- There is an inbound rule that allows all traffic from the security group itself. -- There is an outbound rule that allows all traffic to all addresses. - NEW Security Group: -- Custom security groups do not have inbound allow rules (all inbound traffic is denied by default) -- There is an outbound rule that allows all traffic to all addresses. - Default NACL: -- A VPC automatically comes with a default network ACL which ALLOWS ALL inbound/outbound traffic. - NEW NACL: -- A custom NACL DENIES ALL traffic both inbound and outbound by default. - Default VPC and DNS hostnames: -- In a default VPC instances will be assigned a public and private DNS hostname. When you launch an instance into a default VPC, AWS provides the instance with public and private DNS hostnames that correspond to the public IPv4 and private IPv4 addresses for the instance. - Non-Default (new) VPC and DNS hostnames: -- In a non-default VPC instances will be assigned a private but not a public DNS hostname. When you launch an instance into a nondefault VPC, AWS provides the instance with a private DNS hostname and AWS might provide a public DNS hostname, depending on the DNS attributes you specify for the VPC and if your instance has a public IPv4 address. Other types of Security groups: - RDS DB Security Groups: ________________________ -- When you restore a DB instance to a point in time, only default DB parameters and security groups are restored – you must manually associate all other DB parameters and SGs. This means that the **default** DB security group is applied to the new DB instance upon restore. If you need custom DB security groups applied to your DB instance, you must apply them explicitly using the AWS Management Console, the AWS CLI modify-db-instance command, or the Amazon RDS API ModifyDBInstance operation after the DB instance is available. - EFS Security Groups: EFS Security Groups act as a firewall for EFS, and the rules you add define the traffic flow for EFS. EFS Security: IAM vs POSIX vs EFS Security Groups: - You can control who can ADMINISTER your file system using IAM. You DO NOT use IAM to control access to files and directories by user and group. - You can control access to files and directories with POSIX-compliant user and group-level permissions. -- POSIX permissions allows you to restrict access from hosts by user and group. - EFS Security Groups act as a firewall for EFS, and the rules you add define the traffic flow for EFS. VPC Flow logs: on network interface VS on subnet? - For ELBs, to capture detailed information about the traffic going to and from your Elastic Load Balancer create a VPC flow log for each network interface associated with the ELB (as opposed to a VPC flow log for the subnets in which the ELB is running, which would not be as secure apparently). There is one network interface per load balancer subnet. - Virtual Private Gateway VS Customer Gateway: - Virtual Private Gateway (AWS side of the VPN) - Customer Gateway (customer side of the VPN) You can associate an AWS Direct Connect Gateway with either of the following gateways: - A transit gateway when you have multiple VPCs in the same Region. -- DirectConnect facility > VIF > DirectConnectGateway > Transit Gateway association > VPC in same region (same account?) - A virtual private gateway (for example if the accounts are owned by the same company but in different regions). -- DirectConnect facility > VIF > DirectConnectGateway > VPG association > VPC in different account.