Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion src/Processors/Formats/Impl/ParquetFileMetaDataCache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,21 @@

#if USE_PARQUET

#include <parquet/metadata.h>

namespace DB
{

size_t ParquetFileMetaDataWeightFunction::operator()(const parquet::FileMetaData & metadata) const
{
/// TODO fix-me: using the size on disk is not ideal, but it is the simplest and best we can do for now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ideal? As comment in CacheBase says, functor must return "approximate size" of the data.
Or it is mean that file size is not equal to used disk size and need increase to the file system block size?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the size matches the size on disk. I say not ideal because imo the ideal would be the in memory data structure sizes

/// this implementation is only used by the v1 reader, which is going to be deprecated and a new implementation for the v3
/// reader will be added in the future.
return metadata.size();
}

ParquetFileMetaDataCache::ParquetFileMetaDataCache()
: CacheBase<String, parquet::FileMetaData>(CurrentMetrics::end(), CurrentMetrics::end(), 0)
: CacheBase<String, parquet::FileMetaData, std::hash<String>, ParquetFileMetaDataWeightFunction>(CurrentMetrics::end(), CurrentMetrics::end(), 0)
{}

ParquetFileMetaDataCache * ParquetFileMetaDataCache::instance()
Expand Down
7 changes: 6 additions & 1 deletion src/Processors/Formats/Impl/ParquetFileMetaDataCache.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,12 @@ class FileMetaData;
namespace DB
{

class ParquetFileMetaDataCache : public CacheBase<String, parquet::FileMetaData>
struct ParquetFileMetaDataWeightFunction
{
size_t operator()(const parquet::FileMetaData & metadata) const;
};

class ParquetFileMetaDataCache : public CacheBase<String, parquet::FileMetaData, std::hash<String>, ParquetFileMetaDataWeightFunction>
{
public:
static ParquetFileMetaDataCache * instance();
Expand Down
Loading