AWS Machine Learning Specialty: Bite Size Recap 1/3

AWS Machine Learning Specialty 1/3

Configure an event tracker in Amazon Personalize to record real-time user interactions using AWS Python SDK, AWS Amplify, or AWS CLI.
If the system is trained on historical data, it may yield poor results over time. Create an event tracker based on real-time user interactions to overcome this challenge.
A “recipe” is a term specifying an appropriate algorithm for a given use case.

Contains references to data used as sources and targets for extract, transform, and load (ETL) jobs in AWS Glue.

Use AWS Glue for ETL jobs and Athena for SQL queries on processed data.
Supports various data formats, including CSV, TSV, JSON, Textfiles, Apache ORC, and Apache Parquet.
Compression, partitioning, and using columnar formats like Apache Parquet can enhance performance and reduce costs.
Parquet and ORC support predicate pushdown, optimizing query execution based on statistics stored in blocks.
Athena charges based on the amount of data scanned per query, allowing cost savings through data partitioning, compression, and columnar conversion.

Open-source columnar storage format.
2x faster and takes up 6x less storage in Amazon S3 compared to other text formats.
Copyable to Amazon Redshift cluster from Amazon S3.
Configurable and runnable transformation jobs from CSV to Parquet using AWS Glue.
Well-suited for AWS analytics services like Amazon Athena and Amazon Redshift Spectrum.

Transfer Learning:
- Network initialized with pre-trained weights; only the top fully connected layer has random weights.
- Whole network fine-tuned with new data.
Bias and Variance:
- Bias: Error when a model simplifies assumptions towards a target variable.
- Variance: Error when a model becomes too sensitive to small fluctuations on unseen data.
- High-bias model is underfitting; high-variance model is overfitting; a balanced model has low bias and low variance.
Reducing Bias Error:
- Add more images to training data through data augmentation methods.
Neural Network Layers:
- The number of layers needed depends on the complexity of the problem.

Representation of a joint probability distribution of random variables with a possible mutual causal relationship.
Nodes represent random variables, edges represent causal relationships, and each node has a conditional probability distribution.

Measures statistical relationship between two variables.
Closer to 1 indicates positive correlation; closer to -1 suggests negative correlation; near 0 means weaker correlation.

Helps positively skewed data conform to normally distributed data.
Positively skewed distribution has values clustering to the left with a longer right tail.
Normal distribution is symmetrical about the mean.

Transformation method simplifying complex differential equations into algebraic equations.
Mainly used for digital signal processing.

Happy learning! :)