Cassandra DB中的日期插入:非平凡的1h班次问题

I am a bit desperate about this problem... I have no idea how to face it.

Here is a simpler way to look at this problem:

If my insert cql query is:

"BEGIN BATCH USING CONSISTENCY ONE insert into my_table(id,'2014-04-11 8:00:00',...,'2014-04-15 10:00:00') values ('2036548',3.15,...,4.11) APPLY BATCH"

...and my data request cql query is:

"Select FIRST 100000 '2014-04-01 0:00:00'..'2014-04-16 0:00:00' from my_table where id=2036548"

...why does the inserted date 2014-04-15 10:00:00 changes to 2014-04-15 11:00:00 when pullling it from Cassandra?

The date pulling code in vb.net is:

Public Shared Function getCassandraDate(ByVal value As Byte()) As Date
    Dim buffer As Byte() = New Byte(value.Length - 1) {}
    value.CopyTo(buffer, 0)
    Array.Reverse(buffer)
    Dim ticks As Long = BitConverter.ToInt64(buffer, 0)
    Dim dateTime As New System.DateTime(1970, 1, 1, 0, 0, 0, _
         0)

    dateTime = dateTime.AddMilliseconds(ticks)
    Return dateTime.ToLocalTime
End Function

...same thing in PHP:

date_default_timezone_set("Europe/Paris");
$time = $this->unpackDate($packed_time);
$str_time = date('Y-m-d H:i:s',$time); //TODO : to local time

private function unpackDate($data, $is_name=null)
{
    $arr = unpack('N2', $data);

    // If we are on a 32bit architecture we have to explicitly deal with
    // 64-bit twos-complement arithmetic since PHP wants to treat all ints
    // as signed and any int over 2^31 - 1 as a float
    if (PHP_INT_SIZE == 4) {

        $hi = $arr[1];
        $lo = $arr[2];
        $isNeg = $hi  < 0;

        // Check for a negative
        if ($isNeg) {
            $hi = ~$hi & (int)0xffffffff;
            $lo = ~$lo & (int)0xffffffff;

            if ($lo == (int)0xffffffff) {
                $hi++;
                $lo = 0;
            } else {
                $lo++;
            }
        }

        // Force 32bit words in excess of 2G to pe positive - we deal wigh sign
        // explicitly below

        if ($hi & (int)0x80000000) {
            $hi &= (int)0x7fffffff;
            $hi += 0x80000000;
        }

        if ($lo & (int)0x80000000) {
            $lo &= (int)0x7fffffff;
            $lo += 0x80000000;
        }

        $value = $hi * 4294967296 + $lo;

        if ($isNeg)
            $value = 0 - $value;

    } else {
        // Upcast negatives in LSB bit
        if ($arr[2] & 0x80000000)
            $arr[2] = $arr[2] & 0xffffffff;

        // Check for a negative
        if ($arr[1] & 0x80000000) {
            $arr[1] = $arr[1] & 0xffffffff;
            $arr[1] = $arr[1] ^ 0xffffffff;
            $arr[2] = $arr[2] ^ 0xffffffff;
            $value = 0 - $arr[1]*4294967296 - $arr[2] - 1;
        } else {
            $value = $arr[1]*4294967296 + $arr[2];
        }
    }
    return $value / 1e3;
}

MORE DETAILS

Processing chain:

(1). insertion to Cassandra through .NET

(2). Cassandra data storage

(3). Pulling the data from PHP or .NET

Problem:

As for today, a date being 2014-04-15 10:00:00 in step (1), will come out as 2014-04-15 11:00:00 in step (3).

Details:

(regarding the date format in this chain)

(1). Local time in .NET (Timezone: "Europe/Paris"). Insertion cql that is being executed: "BEGIN BATCH USING CONSISTENCY ONE insert into my_table(id,'2014-04-11 8:00:00',...,'2014-04-15 10:00:00') values ('2036548',3.15,...,4.11) APPLY BATCH"

(2). ??? I don't know what Cassandra does here... ???

(3). Example of cql query to pull the data: "Select FIRST 100000 '2014-04-01 0:00:00'..'2014-04-16 0:00:00' from my_table where id=2036548". In php: date_default_timezone_set("Europe/Paris"); $str_time = date('Y-m-d H:i:s',$time);. In .NET: dateTime.ToLocalTime.

Extra info:

I think it worked well before the daylight saving time change some weeks ago. But I can not be sure about that.

If in step (1), if I changed the date to de date to UTC before inserting it, 2014-04-15 10:00:00 will become 2014-04-15 08:00:00 and the output will be 2014-04-15 09:00:00, which is still not correct.

I highly suspect that the trick here is between steps (1) and (2), that is to say, me not being able to understand how Cassandra treats dates.


Edit1:

@Ananth 's questions:

both cassandra and client run in the same datacenter?

It is complicated:

  • Insertion in .NET from server1, a different server from server-cassandra (datacenter).
  • PHP (to pull the data) running on server-cassandra.
  • .NET (to pull the data) running on server1, not on server-cassandra.
  • PHP and .NET pulling the same result.

Can you post your schema here?

Here it is

CREATE TABLE tsmeasures (
  id int PRIMARY KEY
) WITH
  comment='' AND
  comparator=timestamp AND
  read_repair_chance=0.100000 AND
  gc_grace_seconds=0 AND
  default_validation=double AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='SnappyCompressor';

Edit2:

After testing it step by step, this is the result:

  • real date : 2014-04-15 17:00:00 (localtime)
  • cql text : '2014-04-15 15:00:00' (to UTC, done through .NET)
  • PHP Cassandra Unpack of this date => $ticks = 1397577600 (*) The unpack is done with the piece of code shown before

Ticks converted (through http://www.epochconverter.com/ )

  • GMT: Tue,
  • 15 Apr 2014 16:00:00 GMT Your time zone: 4/15/2014 6:00:00 PM GMT+2

These results makes no sense to me...

More details:

cql insert:

"BEGIN BATCH USING CONSISTENCY ONE insert into tsmeasures(id,'2014-04-11 15:00:00',...,'2014-04-15 15:00:00') values ('2036548',0,...,4.85) APPLY BATCH"

cql fetch:

"SELECT '2014-04-10 16:00:00'..'2014-04-20 17:00:00' FROM tsmeasures WHERE id IN (2036548,2036479,2036174,650877)"

Thus '2014-04-15 15:00:00' is included in the range of the fetch, and I can identify it because it is the highest value.

I will keep digging...

Before Edit

Is there a clock time sync problem between your client and cassandra? I would strictly recommend running NTP between your client and cassandra installation.

Post Edit

CREATE TABLE tsmeasures (
  id int PRIMARY KEY
) WITH
  comment='' AND
  comparator=timestamp AND
  read_repair_chance=0.100000 AND
  gc_grace_seconds=0 AND
  default_validation=double AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write='true' AND
  compaction_strategy_class='SizeTieredCompactionStrategy' AND
  compression_parameters:sstable_compression='SnappyCompressor';

From what you have given , it looks like you are trying to get the insertion time .

Your problem might be due to clients running in different clock cycles with respect to cassandra. Cassandra just places a unix timestamp for each write.

So what is happening here from what i see.

You write from client using timestamp X(datastax driver sets this insertion timestamp). Cassandra writes with X.

You read with a timestamp Y. Cassandra tries to read with timestamp Y(So , as per your explanation, PHP client is there in a different location).

Both a are bound to differ.

Solution 1

Try to have a global NTP between the entire set up so that client clock cycles are in sync with cassandra.

Solution 2

Insert a column named timestamp which is user driven and do a range scan based on that

Solution 3

Set the insertion time in DML operations.

This seems to be a time zone issue. It appears you are neither specifying a timezone when storing nor when retrieving the timestamps. According to the documentation Cassandra applies the timezone of the coordinator node handling the write request if no timezone is supplied by the client. If timestamps shift between writing and reading them, that probably means all or some of your Cassandra nodes are not configured for the same timezone as your client is.