Source is Amazon Business Productivity
Amazon Chime SDK Voice Connector is a cloud-based service that provides SIP trunking for voice calling. It is widely used by businesses and organizations to facilitate communication with customers and partners. With Amazon Chime Voice SDK Connector, you can connect your existing phone system to the cloud and take advantage of features such as scalability, flexibility, and cost-effectiveness. However, managing and processing call detail records (CDRs) can be a challenge for businesses, especially when dealing with large volumes of data. One feature of Amazon Chime SDK Voice Connector is the CDR processor, which helps in recording and processing call data. By storing this data in a datalake and analyzing it using data analytics tools, businesses can gain valuable insights into their communication patterns. For example, they can identify which teams or departments are making the most calls, which participants are most frequently involved in calls, and which times of day are busiest for calls. This information can be used to optimize staffing levels, improve collaboration and communication between teams, and identify areas for improvement in the communication infrastructure. For example, if certain teams are making a disproportionate number of calls, it may indicate that they need additional resources or support to complete their work more efficiently. Overall, processing Amazon Chime SDK Voice Connector CDRs using a datalake can help businesses improve their communication infrastructure, optimize operations, and make data-driven decisions to drive growth and success.
In this blog post, we will explore how to convert these CDRs vended by Amazon Chime SDK Voice Connector into Parquet format and its benefits. Converting Amazon Chime SDK Voice Connector CDR data into Parquet format is a process that helps in storing and processing call data.
Parquet is a columnar storage format that is optimized for large-scale data processing. The conversion process helps in improving the performance of data processing and reduces storage costs. Parquet is a highly compressed storage format that helps in reducing the storage space required for storing data. This, in turn, helps in reducing the storage costs associated with storing large amounts of data. The format also allows for efficient data retrieval, which helps in reducing the time taken for data retrieval. In Parquet format, data is stored in columns rather than rows, which allows for efficient data compression and encoding. The format also supports nested data structures, which makes it ideal for storing complex data types such as JSON.
Overview
We will start by setting up an Amazon Simple Storage Service (Amazon S3) bucket to receive the CDRs from Amazon Chime SDK Voice Connector in JSON format. Then, we will create an AWS Lambda function that is triggered by S3 PUT events, which occurs every time a CDR is written when a call completes. The Lambda function will read the CDR data in JSON format and send it to Amazon Kinesis Data Firehose for processing.
Once the data is in Kinesis Data Firehose, we will create a Kinesis Data Firehose delivery stream that receives the data by the Lambda direct PUTs and converts the data to Parquet format. We will use the buffer hints provided by Kinesis Data Firehose to optimize delivery of data to S3. By adjusting the buffer size and buffer interval, you can balance the cost of data delivery with the need for real-time data processing. This can help you reduce delivery costs while ensuring that data is processed in a timely manner.
To query the data stored in Parquet format, we will create an AWS Glue table that is dynamically partitioned by the timestamp of the CDR data. This will enable us to query the data in a more efficient manner, as we can filter by the timestamp of the CDR data. Finally, we will use Amazon Athena to query the data stored in the dynamically partitioned Glue table.
By using Lambda, Kinesis Data Firehose, and dynamically partitioned Glue tables, we can process the CDR data in real-time, transform it into Parquet format, and query it in a cost-effective way and gain valuable insights into voice communications.
Prerequisites
To implement the solution described in this blog post, you will need the following:
- An AWS account: You will need an AWS account to create and manage the necessary AWS resources.
- Amazon Chime SDK Voice Connector: You will need to have Amazon Chime SDK Voice Connector set up to capture Call Detail Records (CDR) data.
- Amazon S3 bucket: You will need an S3 bucket to store the CDR data in JSON format.
- AWS Lambda: You will need to create an Lambda function that will be triggered by S3 PUT events to process the CDRs and PUT them directly into the Kinesis Data Firehose stream.
- Amazon Kinesis Data Firehose: You will need to create a Kinesis Data Firehose delivery stream to convert the CDRs to Parquet format.
- AWS IAM: You will need IAM roles and policies to grant necessary permissions to your Lambda function and Kinesis Data Firehose delivery stream.
- Knowledge of TypeScript: You will need to be familiar with Node.js programming to modify the example Lambda function provided in this blog post.
- Basic knowledge of Amazon Athena: You will need a basic understanding of Athena to query the Parquet data stored in S3.
Walkthrough
In order to convert the CDR JSON objects into Parquet files and optimize Athena queries through data partitioning, the utilization of AWS components such as Lambda and Kinesis Data Firehose is necessary.
Create an Kinesis Data Firehose delivery stream by utilizing Lambda. The Lambda function is triggered on the occurrence of an “Object Created” event in the specified S3 bucket. Kinesis Data Firehose collects these events for a defined interval and compresses the records to a Parquet file using the schema defined in the Glue Catalog. You can configure the S3 bucket to partition the Parquet files based on specific fields in the CDR JSON objects, e.g. “EndTimeEpochSeconds”. This will optimize Athena queries by allowing you to query specific partitions of data instead of scanning the entire dataset.
The Glue Catalog provides a centralized metadata repository that makes it easy to discover, manage, and query data across multiple data sources. With the Glue Catalog, developers can define tables, partitions, and schemas for their data, making it easy to query and analyze data using Athena.
Validation
To validate our implementation, we will perform the following steps:
- Trigger the Lambda function by uploading sample CDR data to the S3 bucket. You can use the CDRs vended by Amazon Chime SDK Voice Connector into your S3 bucket
s3://s3_bucket_name/Amazon-Chime-Voice-Connector-CDRs/json/VOICECONNECTORID/YEAR/MONTH/DATE/
- Check the Kinesis Data Firehose delivery stream for successful delivery of the CDR data in Parquet format. We can check Kinesis Data Firehose logs created under
/aws/kinesisfirehose/<delivery-stream-name>
- And finally query the data stored in the Glue table using Athena, you should be able to see the results.
Conclusion
In conclusion, processing Amazon Chime SDK Voice Connector CDR data in real-time using AWS services is a powerful way to gain insights into your communication system. By leveraging services like Kinesis Data Streams, Lambda, and S3, you can easily capture, process, and store CDR data in real-time. This allows you to monitor call quality, track usage patterns, and identify potential issues before they become major problems. With the right tools and strategies in place, you can take full advantage of the rich data available through Amazon Chime SDK Voice Connector CDRs and use it to improve your communication system and enhance the overall user experience.
There are several benefits to using Amazon Chime SDK Voice Connector CDR data instead of using a crawler or static table. Real-time data: Amazon Chime SDK Voice Connector CDR data is available in real-time, which means you can monitor and analyze call data as it happens. This is important for identifying issues and taking corrective action quickly. Rich data: Amazon Chime SDK Voice Connector CDR data provides rich information about each call, including call duration, call quality, and call participants. This data can be used to gain insights into usage patterns, identify trends, and optimize your communication system. Scalability: Amazon Chime SDK Voice Connector CDR data can be easily scaled to handle large volumes of data. This is important for organizations that have a high volume of calls and need to process data quickly.
To learn more about CDR data vended by Amazon Chime SDK Voice Connector, review the following resources: