A good analytics solution with strong database performance is a must-have for any software product. Otherwise, you have no idea how your product is performing.
Here are some classic questions you’ve probably asked yourself about your game:
- What do people like?
- What do they hate?
- What features do they use?
- Did you just spend several months developing a feature that no one uses?
- Why don’t they use it? Is it because it’s not useful? Or is it simply because they can’t find it?
And as much as you’d love to, you simply can’t ask your users all these questions.
Because let’s be honest: We’re all human, and we tend to forget details very quickly. I can hardly remember what I ate for breakfast yesterday.
So how can you expect me to remember some subliminal detail about a window I didn’t even know existed in an app I’m using? This is probably why I didn’t open this “amazing feature.”
Perhaps you and your thousands (maybe millions) of users are amazing (unlike me) and can remember every little detail of everything they did. Even still, it’s simply unfeasible to ask them tons of questions every day.
Why It’s So Important to Store High-Resolution Data in Analytics
Yes, collecting data is important. But we need to clarify what we mean by high-resolution data.
Let’s start with something simple: Collecting statistics on how many people start your game each day. If you see a sudden drop, you’ll start asking more questions. For that, you’ll need more data to answer them.
For example:
- Is there a specific place where users are dropping off?
- What did they do before that step?
- How are they different from people who didn’t drop off at that step?
Eventually, you’ll start collecting tons of information just in case you need to answer a question you haven’t even thought of yet. With high-resolution data, you’re better equipped to answer those unexpected questions.
Why Query Performance Matters in Analytics
Think of it this way: Everything costs money. The time spent by an analytics team member costs money. The hardware spent on running a query costs money. But most importantly, the wait time costs money.
I don’t know about you, but I’d prefer to spend the money on productive things that actually move my company forward. It’s better than entire teams sitting around, waiting for a query to finish running.
Here’s how the investigation usually works:
- You look at some predefined dashboards and try to find correlations in the data that might indicate the problem and where it happens.
- Then, you start making hypotheses and try to verify or disprove them using the data you have.
- It can take hours (even days) to build and run the queries.
- Then, you learn that you asked the wrong question or forgot to add something to the database query.
- Then, you need to start all over again.
- Eventually, you might find that your hypothesis was wrong, make a new one, and repeat the process.
The faster your queries are executed, the quicker you can find and fix the root cause of the problem. This is much more productive than taking a long coffee break every time you press the “Run the Query” button.
Storage Considerations When Dealing with Big Data
When managing storage for big data, you need to strike a fine balance between costs, database performance, and utility. Here are some key considerations:
- While high-resolution data is priceless when it comes to answering critical questions, most of it might never be used.
- Many analysts are scared to delete data that might one day prove to be valuable. This can lead to an inflation in storage costs.
- Unstructured and disorganized data can also impact performance, causing inefficient analytics.
To address these issues, you need to implement efficient storage strategies. These include:
- Prioritizing actionable information
- Compressing data
- Archiving leftover data.
With these methods, you can reduce costs without compromising on database performance. And you can still extract insights when necessary.
Benchmark of Popular Database Solutions Used in Analytics
When selecting a database solution for your analytics; performance and storage are two key factors to consider.
Below is a benchmark of popular solutions based on a simple SELECT query on a dataset of 265 million events. The average query time is based on five SELECT queries. Additionally, we were curious about how much space the data would take on the disk. So we benchmarked it as well.
Database | Simple Query Time (265M Events) | Data Size on Disk |
Keewano | 0.5 seconds | 1.4 GB |
Google Cloud (Big Query) | >200 minutes | 7.4 GB |
InfluxDB | ~55 minutes | 1.65 GB |
Snowflake | ~110 minutes | 1.9 GB |
MySQL | ~13 minutes | 11 GB |
ClickHouse | ~5 minutes | 7.9 GB |
Key Takeaways:
- Keewano leads in both query speed and data size. Its high database performance and storage efficiency make it ideal for high-performance analytics.
- While not as efficient, ClickHouse is the second-best overall option on the list. It provides strong query performance, but you need to pay attention to storage.
- Although Snowflake has the second smallest data size on disk, its query time is the second longest.
- With the longest query time and one of the largest data storage footprints, Google Cloud (Big Query) is the weakest option overall.
It’s Time to Speed Up Your Database Performance
Someone once said to me, “Your data is not that big; you just store it and process it wrong.” This phrase couldn’t be truer today. To all the analysts out there: You need high-resolution data, efficient storage, and fast queries. By optimizing how you handle your data, you can unlock smarter, faster insights.
20+ Expertise in game engine and game play programming, 3D/2D graphics, AI, network, performance optimizations, multi-platform systems, and all other aspects of game development.
Languages include C/C++, C#, LISP, Java, Python, x86 assembly, ARM assembly and a bit of everything else.